269 |
269 |
270 Central to our proof will be the solution of equational systems |
270 Central to our proof will be the solution of equational systems |
271 involving equivalence classes of languages. For this we will use Arden's lemma \cite{Brzozowski64}, |
271 involving equivalence classes of languages. For this we will use Arden's lemma \cite{Brzozowski64}, |
272 which solves equations of the form @{term "X = A ;; X \<union> B"} provided |
272 which solves equations of the form @{term "X = A ;; X \<union> B"} provided |
273 @{term "[] \<notin> A"}. However we will need the following `reverse' |
273 @{term "[] \<notin> A"}. However we will need the following `reverse' |
274 version of Arden's lemma. |
274 version of Arden's lemma (`reverse' in the sense of changing the order of @{text "\<cdot>"}). |
275 |
275 |
276 \begin{lemma}[Reverse Arden's Lemma]\label{arden}\mbox{}\\ |
276 \begin{lemma}[Reverse Arden's Lemma]\label{arden}\mbox{}\\ |
277 If @{thm (prem 1) arden} then |
277 If @{thm (prem 1) arden} then |
278 @{thm (lhs) arden} if and only if |
278 @{thm (lhs) arden} if and only if |
279 @{thm (rhs) arden}. |
279 @{thm (rhs) arden}. |
736 @{thm (concl) Subst_all_satisfies_invariant} |
736 @{thm (concl) Subst_all_satisfies_invariant} |
737 \end{center} |
737 \end{center} |
738 |
738 |
739 \noindent |
739 \noindent |
740 Finiteness is straightforward, as @{const Subst} and @{const Arden} operations |
740 Finiteness is straightforward, as @{const Subst} and @{const Arden} operations |
741 keep the equational system finite. These operation also preserve soundness |
741 keep the equational system finite. These operations also preserve soundness |
742 and distinctness (we proved soundness for @{const Arden} in Lem.~\ref{ardenable}). |
742 and distinctness (we proved soundness for @{const Arden} in Lem.~\ref{ardenable}). |
743 The property ardenable is clearly preserved because the append-operation |
743 The property ardenable is clearly preserved because the append-operation |
744 cannot make a regular expression to match the empty string. Validity is |
744 cannot make a regular expression to match the empty string. Validity is |
745 given because @{const Arden} removes an equivalence class from @{text yrhs} |
745 given because @{const Arden} removes an equivalence class from @{text yrhs} |
746 and then @{const Subst_all} removes @{text Y} from the equational system. |
746 and then @{const Subst_all} removes @{text Y} from the equational system. |
846 |
846 |
847 |
847 |
848 section {* Myhill-Nerode, Second Part *} |
848 section {* Myhill-Nerode, Second Part *} |
849 |
849 |
850 text {* |
850 text {* |
|
851 We will prove in this section the second part of the Myhill-Nerode |
|
852 theorem. It can be formulated in our setting as follows. |
851 |
853 |
852 \begin{theorem} |
854 \begin{theorem} |
853 Given @{text "r"} is a regular expressions, then @{thm Myhill_Nerode2}. |
855 Given @{text "r"} is a regular expressions, then @{thm Myhill_Nerode2}. |
854 \end{theorem} |
856 \end{theorem} |
855 |
857 |
856 \begin{proof} |
858 \noindent |
857 By induction on the structure of @{text r}. The cases for @{const NULL}, @{const EMPTY} |
859 The proof will be by induction on the structure of @{text r}. It turns out |
858 and @{const CHAR} are straightforward, because we can easily establish |
860 the base cases are straightforward. |
|
861 |
|
862 |
|
863 \begin{proof}[Base Cases] |
|
864 The cases for @{const NULL}, @{const EMPTY} and @{const CHAR} are routine, because |
|
865 we can easily establish |
859 |
866 |
860 \begin{center} |
867 \begin{center} |
861 \begin{tabular}{l} |
868 \begin{tabular}{l} |
862 @{thm quot_null_eq}\\ |
869 @{thm quot_null_eq}\\ |
863 @{thm quot_empty_subset}\\ |
870 @{thm quot_empty_subset}\\ |
864 @{thm quot_char_subset} |
871 @{thm quot_char_subset} |
865 \end{tabular} |
872 \end{tabular} |
866 \end{center} |
873 \end{center} |
867 |
874 |
|
875 \noindent |
|
876 hold, which shows that @{term "UNIV // \<approx>(L r)"} must be finite.\qed |
868 \end{proof} |
877 \end{proof} |
869 |
878 |
|
879 \noindent |
|
880 Much more interesting are the inductive cases, which seem hard to be solved |
|
881 directly. The reader is invited to try. Our method will rely on some |
|
882 \emph{tagging} functions of strings. Given the inductive hypothesis, it will |
|
883 be easy to prove that the range of these tagging functions is finite. |
870 |
884 |
871 @{thm tag_str_ALT_def[where ?L1.0="A" and ?L2.0="B"]} |
885 @{thm tag_str_ALT_def[where ?L1.0="A" and ?L2.0="B"]} |
872 |
886 |
873 @{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B"]} |
887 @{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B"]} |
874 |
888 |
906 partitions. Proving the existence of such a regular expression via automata would |
920 partitions. Proving the existence of such a regular expression via automata would |
907 be quite involved. It includes the |
921 be quite involved. It includes the |
908 steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text |
922 steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text |
909 "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"} |
923 "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"} |
910 regular expression. |
924 regular expression. |
|
925 |
|
926 While regular expressions are convenient in formalisations, they have some |
|
927 limitations. One is that there seems to be no notion of a minimal regular |
|
928 expression, like there is for automata. For an implementation of a simple |
|
929 regular expression matcher, whose correctness has been formally |
|
930 established, we refer the reader to Owens and Slind \cite{OwensSlind08}. |
|
931 |
911 |
932 |
912 Our formalisation consists of ??? lines of Isabelle/Isar code for the first |
933 Our formalisation consists of ??? lines of Isabelle/Isar code for the first |
913 direction and ??? for the second. While this might be seen as too large to |
934 direction and ??? for the second. While this might be seen as too large to |
914 count as a concise proof pearl, this should be seen in the context of the |
935 count as a concise proof pearl, this should be seen in the context of the |
915 work done by Constable at al \cite{Constable00} who formalised the |
936 work done by Constable at al \cite{Constable00} who formalised the |
958 is also where our method shines, because we can completely side-step the |
979 is also where our method shines, because we can completely side-step the |
959 standard argument \cite{Kozen97} where automata need to be composed, which |
980 standard argument \cite{Kozen97} where automata need to be composed, which |
960 as stated in the Introduction is not so convenient to formalise in a |
981 as stated in the Introduction is not so convenient to formalise in a |
961 HOL-based theorem prover. |
982 HOL-based theorem prover. |
962 |
983 |
963 While regular expressions are convenient in formalisations, they have some |
|
964 limitations. One is that there seems to be no notion of a minimal regular |
|
965 expression, like there is for automata. For an implementation of a simple |
|
966 regular expression matcher, whose correctness has been formally |
|
967 established, we refer the reader to Owens and Slind \cite{OwensSlind08}. |
|
968 |
|
969 *} |
984 *} |
970 |
985 |
971 |
986 |
972 (*<*) |
987 (*<*) |
973 end |
988 end |