181 |
181 |
182 Because of these problems to do with representing automata, there seems |
182 Because of these problems to do with representing automata, there seems |
183 to be no substantial formalisation of automata theory and regular languages |
183 to be no substantial formalisation of automata theory and regular languages |
184 carried out in HOL-based theorem provers. Nipkow establishes in |
184 carried out in HOL-based theorem provers. Nipkow establishes in |
185 \cite{Nipkow98} the link between regular expressions and automata in |
185 \cite{Nipkow98} the link between regular expressions and automata in |
186 the context of lexing. The only larger formalisations of automata theory |
186 the context of lexing. Berghofer and Reiter formalise automata working over |
|
187 bit strings in the context of Presburger arithmetic \cite{BerghoferReiter09}. |
|
188 The only larger formalisations of automata theory |
187 are carried out in Nuprl \cite{Constable00} and in Coq (for example |
189 are carried out in Nuprl \cite{Constable00} and in Coq (for example |
188 \cite{Filliatre97}). |
190 \cite{Filliatre97}). |
189 |
191 |
190 In this paper, we will not attempt to formalise automata theory in |
192 In this paper, we will not attempt to formalise automata theory in |
191 Isabelle/HOL, but take a completely different approach to regular |
193 Isabelle/HOL, but take a completely different approach to regular |
844 |
846 |
845 |
847 |
846 section {* Myhill-Nerode, Second Part *} |
848 section {* Myhill-Nerode, Second Part *} |
847 |
849 |
848 text {* |
850 text {* |
849 TO BE DONE |
|
850 |
|
851 |
851 |
852 \begin{theorem} |
852 \begin{theorem} |
853 Given @{text "r"} is a regular expressions, then @{thm Myhill_Nerode2}. |
853 Given @{text "r"} is a regular expressions, then @{thm Myhill_Nerode2}. |
854 \end{theorem} |
854 \end{theorem} |
855 |
855 |
856 % \begin{proof} |
856 \begin{proof} |
857 % By induction on the structure of @{text r}. The cases for @{const NULL}, @{const EMPTY} |
857 By induction on the structure of @{text r}. The cases for @{const NULL}, @{const EMPTY} |
858 % and @{const CHAR} are straightforward, because we can easily establish |
858 and @{const CHAR} are straightforward, because we can easily establish |
859 |
859 |
860 % \begin{center} |
860 \begin{center} |
861 % \begin{tabular}{l} |
861 \begin{tabular}{l} |
862 % @{thm quot_null_eq}\\ |
862 @{thm quot_null_eq}\\ |
863 % @{thm quot_empty_subset}\\ |
863 @{thm quot_empty_subset}\\ |
864 % @{thm quot_char_subset} |
864 @{thm quot_char_subset} |
865 % \end{tabular} |
865 \end{tabular} |
866 % \end{center} |
866 \end{center} |
867 % |
867 |
868 % \end{proof} |
868 \end{proof} |
869 |
869 |
870 |
870 |
871 % @{thm tag_str_ALT_def[where ?L1.0="A" and ?L2.0="B"]} |
871 @{thm tag_str_ALT_def[where ?L1.0="A" and ?L2.0="B"]} |
872 |
872 |
873 % @{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B"]} |
873 @{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B"]} |
874 |
874 |
875 % @{thm tag_str_STAR_def[where ?L1.0="A"]} |
875 @{thm tag_str_STAR_def[where ?L1.0="A"]} |
876 *} |
876 *} |
877 |
877 |
878 |
878 |
879 section {* Conclusion and Related Work *} |
879 section {* Conclusion and Related Work *} |
880 |
880 |
900 @{term "s\<^isub>1 \<approx>A s\<^isub>2"} if and only if @{term "s\<^isub>1 \<approx>(-A) s\<^isub>2"} |
900 @{term "s\<^isub>1 \<approx>A s\<^isub>2"} if and only if @{term "s\<^isub>1 \<approx>(-A) s\<^isub>2"} |
901 \end{center} |
901 \end{center} |
902 |
902 |
903 \noindent |
903 \noindent |
904 holds for any strings @{text "s\<^isub>1"} and @{text |
904 holds for any strings @{text "s\<^isub>1"} and @{text |
905 "s\<^isub>2"}. Therefore @{text A} and @{term "-A"} give rise to the same |
905 "s\<^isub>2"}. Therefore @{text A} and the complement language @{term "-A"} give rise to the same |
906 partitions. |
906 partitions. Proving the existence of such a regular expression via automata would |
907 Proving the same result via automata would be quite involved. It includes the |
907 be quite involved. It includes the |
908 steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text |
908 steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text |
909 "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"} |
909 "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"} |
910 regular expression. |
910 regular expression. |
911 |
911 |
912 Our formalisation consists of ??? lines of Isabelle/Isar code for the first |
912 Our formalisation consists of ??? lines of Isabelle/Isar code for the first |
913 direction and ??? for the second. While this might be seen as too large to |
913 direction and ??? for the second. While this might be seen as too large to |
914 count as a concise proof pearl, this should be seen in the context of the |
914 count as a concise proof pearl, this should be seen in the context of the |
915 work done by Constable at al \cite{Constable00} who formalised the |
915 work done by Constable at al \cite{Constable00} who formalised the |
916 Myhill-Nerode theorem in Nuprl using automata. This is the most |
916 Myhill-Nerode theorem in Nuprl using automata. |
917 They write that their |
917 They write that their |
918 four-member team needed something on the magnitute of 18 months to formalise |
918 four-member team needed something on the magnitute of 18 months for their |
919 the Myhill-Nerode theorem. The estimate for our formalisation is that we |
919 formalisation. The estimate for our formalisation is that we |
920 needed approximately 3 months and this included the time to find our proof |
920 needed approximately 3 months and this included the time to find our proof |
921 arguments. Unlike Constable et al, who were able to follow the proofs from |
921 arguments. Unlike Constable et al, who were able to follow the proofs from |
922 \cite{???}, we had to find our own arguments. So for us the formalisation |
922 \cite{HopcroftUllman69}, we had to find our own arguments. So for us the formalisation |
923 was not the bottleneck. It is hard to gauge the size of a formalisation in |
923 was not the bottleneck. It is hard to gauge the size of a formalisation in |
924 Nurpl, but from what is shown in the Nuprl Math Library about their |
924 Nurpl, but from what is shown in the Nuprl Math Library about their |
925 development it seems substantially larger than ours. The code of |
925 development it seems substantially larger than ours. The code of |
926 ours can be found under |
926 ours can be found under |
927 |
927 |
932 |
932 |
933 Our proof of the first direction is very much inspired by \emph{Brzozowski's |
933 Our proof of the first direction is very much inspired by \emph{Brzozowski's |
934 algebraic mehod} used to convert a finite automaton to a regular |
934 algebraic mehod} used to convert a finite automaton to a regular |
935 expression \cite{Brzozowski64}. The close connection can be seen by considering the equivalence |
935 expression \cite{Brzozowski64}. The close connection can be seen by considering the equivalence |
936 classes as the states of the minimal automaton for the regular language. |
936 classes as the states of the minimal automaton for the regular language. |
937 However there are some subtle differences. If we identify equivalence |
937 However there are some subtle differences. Since we identify equivalence |
938 classes with the states of the automaton, then the most natural choice is to |
938 classes with the states of the automaton, then the most natural choice is to |
939 characterise each state with the set of strings starting from the initial |
939 characterise each state with the set of strings starting from the initial |
940 state leading up to that state. Usually, however, the states are characterised as the |
940 state leading up to that state. Usually, however, the states are characterised as the |
941 ones starting from that state leading to the terminal states. The first |
941 ones starting from that state leading to the terminal states. The first |
942 choice has consequences how the initial equational system is set up. We have |
942 choice has consequences how the initial equational system is set up. We have |
946 |
946 |
947 We briefly considered using the method Brzozowski presented in the Appendix |
947 We briefly considered using the method Brzozowski presented in the Appendix |
948 of~\cite{Brzozowski64} in order to prove the second direction of the |
948 of~\cite{Brzozowski64} in order to prove the second direction of the |
949 Myhill-Nerode theorem. There he calculates the derivatives for regular |
949 Myhill-Nerode theorem. There he calculates the derivatives for regular |
950 expressions and shows that there can be only finitely many of them. We could |
950 expressions and shows that there can be only finitely many of them. We could |
951 use as the tag of a string @{text s} the derivative of a regular expression |
951 have used as the tag of a string @{text s} the derivative of a regular expression |
952 generated with respect to @{text s}. Using the fact that two strings are |
952 generated with respect to @{text s}. Using the fact that two strings are |
953 Myhill-Nerode related whenever their derivative is the same together with |
953 Myhill-Nerode related whenever their derivative is the same together with |
954 the fact that there are only finitely many derivatives for a regular |
954 the fact that there are only finitely many derivatives for a regular |
955 expression would give us the same argument as ours. However it seems not so easy to |
955 expression would give us the same argument as ours. However it seems not so easy to |
956 calculate the derivatives and then to count them. Therefore we preferred our |
956 calculate the derivatives and then to count them. Therefore we preferred our |