339 (Definition~\ref{baddef}). This is because we cannot make a |
339 (Definition~\ref{baddef}). This is because we cannot make a |
340 definition in HOL that is only polymorphic in the state type, but |
340 definition in HOL that is only polymorphic in the state type, but |
341 not in the predicate for regularity; and there is no type |
341 not in the predicate for regularity; and there is no type |
342 quantification available in HOL.\footnote{Slind already pointed out |
342 quantification available in HOL.\footnote{Slind already pointed out |
343 this problem in an email to the HOL4 mailing list on 21st April |
343 this problem in an email to the HOL4 mailing list on 21st April |
344 2005.} Coq, for example, has quantification over types and thus can |
344 2005.} Systems such as Coq permit quantification over types and thus can |
345 state such a definition. This has been recently exploited in a |
345 state such a definition. This has been recently exploited in an |
346 slick formalisation of the Myhill-Nerode theorem in Coq by |
346 elegant and short formalisation of the Myhill-Nerode theorem in Coq by |
347 \citeN{XXX}. |
347 \citeN{Smolka13}, but as said this is not avaible to us working in Isabelle/HOL. |
348 |
348 |
349 An alternative, which provides us with a single type for states of automata, |
349 An alternative, which provides us with a single type for states of automata, |
350 is to give every state node an identity, for example a natural number, and |
350 is to give every state node an identity, for example a natural number, and |
351 then be careful to rename these identities apart whenever connecting two |
351 then be careful to rename these identities apart whenever connecting two |
352 automata. This results in clunky proofs establishing that properties are |
352 automata. This results in clunky proofs establishing that properties are |
468 |
468 |
469 \begin{quote} |
469 \begin{quote} |
470 ``{\it \ldots [b]ut is there a rigorous way to even state that?}'' |
470 ``{\it \ldots [b]ut is there a rigorous way to even state that?}'' |
471 \end{quote} |
471 \end{quote} |
472 |
472 |
|
473 \noindent |
|
474 We shall give an answer to these questions with our paper. |
|
475 |
473 %Moreover, no side-conditions will be needed for regular expressions, |
476 %Moreover, no side-conditions will be needed for regular expressions, |
474 %like we need for automata. |
477 %like we need for automata. |
475 |
478 |
476 |
479 |
477 The convenience of regular expressions has |
480 The convenience of regular expressions has |
771 finitely many terms of the form @{text "(Y\<^sub>i\<^sub>j, ATOM c\<^sub>i\<^sub>j)"} in a right-hand side |
774 finitely many terms of the form @{text "(Y\<^sub>i\<^sub>j, ATOM c\<^sub>i\<^sub>j)"} in a right-hand side |
772 since by assumption there are only finitely many |
775 since by assumption there are only finitely many |
773 equivalence classes and only finitely many characters. |
776 equivalence classes and only finitely many characters. |
774 The term @{text "\<lambda>(ONE)"} in the first equation acts as a marker for the initial state, that |
777 The term @{text "\<lambda>(ONE)"} in the first equation acts as a marker for the initial state, that |
775 is the equivalence class |
778 is the equivalence class |
776 containing the empty string @{text "[]"}.\footnote{Note that we mark, roughly speaking, the |
779 containing the empty string @{text "[]"}. |
777 single `initial' state in the equational system, which is different from |
|
778 the method by \citeN{Brzozowski64}, where he marks the |
|
779 `terminal' states. We are forced to set up the equational system in our |
|
780 way, because the Myhill-Nerode Relation determines the `direction' of the |
|
781 transitions---the successor `state' of an equivalence class @{text Y} can |
|
782 be reached by adding a character to the end of @{text Y}. This is also the |
|
783 reason why we have to use our reversed version of Arden's Lemma.} |
|
784 In our running example we have the initial equational system |
780 In our running example we have the initial equational system |
785 % |
781 % |
786 \begin{equation}\label{exmpcs} |
782 \begin{equation}\label{exmpcs} |
787 \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
783 \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
788 @{term "X\<^sub>1"} & @{text "="} & @{text "\<lambda>(ONE)"}\\ |
784 @{term "X\<^sub>1"} & @{text "="} & @{text "\<lambda>(ONE)"}\\ |
791 @{term "X\<^sub>4"} & @{text "="} & @{text "(X\<^sub>1, ATOM b) + (X\<^sub>2, ATOM a) + (X\<^sub>3, ATOM a)"}\\ |
787 @{term "X\<^sub>4"} & @{text "="} & @{text "(X\<^sub>1, ATOM b) + (X\<^sub>2, ATOM a) + (X\<^sub>3, ATOM a)"}\\ |
792 & & \mbox{}\hspace{0mm}@{text "+ (X\<^sub>3, ATOM b) + (X\<^sub>4, ATOM a) + (X\<^sub>4, ATOM b)"}\\ |
788 & & \mbox{}\hspace{0mm}@{text "+ (X\<^sub>3, ATOM b) + (X\<^sub>4, ATOM a) + (X\<^sub>4, ATOM b)"}\\ |
793 \end{tabular}} |
789 \end{tabular}} |
794 \end{equation} |
790 \end{equation} |
795 |
791 |
796 \noindent |
792 Note that we mark, roughly speaking, the |
|
793 single `initial' state in the equational system, which is different from |
|
794 the method by \citeN{Brzozowski64}, where he marks the |
|
795 `terminal' states. We are forced to set up the equational system in our |
|
796 way, because the Myhill-Nerode Relation determines the `direction' of the |
|
797 transitions---the successor `state' of an equivalence class @{text Y} can |
|
798 be reached by adding a character to the end of @{text Y}. If we had defined |
|
799 our equations the `Brzozowski-way', then our variables do not correspond to |
|
800 the equivalence classes generated by the Myhill-Nerode relation. This need of reverse marking is also the |
|
801 reason why we have to use our reversed version of Arden's Lemma. |
|
802 |
797 Overloading the function @{text \<calL>} for the two kinds of terms in the |
803 Overloading the function @{text \<calL>} for the two kinds of terms in the |
798 equational system, we have |
804 equational system, we have |
799 |
805 |
800 \begin{center} |
806 \begin{center} |
801 @{text "\<calL>(Y, r) \<equiv>"} % |
807 @{text "\<calL>(Y, r) \<equiv>"} % |
816 \begin{equation}\label{inv2} |
822 \begin{equation}\label{inv2} |
817 @{text "X\<^sub>1 = \<calL>(Y\<^sub>1\<^sub>1, ATOM c\<^sub>1\<^sub>1) \<union> \<dots> \<union> \<calL>(Y\<^sub>1\<^sub>p, ATOM c\<^sub>1\<^sub>p) \<union> \<calL>(\<lambda>(ONE))"}. |
823 @{text "X\<^sub>1 = \<calL>(Y\<^sub>1\<^sub>1, ATOM c\<^sub>1\<^sub>1) \<union> \<dots> \<union> \<calL>(Y\<^sub>1\<^sub>p, ATOM c\<^sub>1\<^sub>p) \<union> \<calL>(\<lambda>(ONE))"}. |
818 \end{equation} |
824 \end{equation} |
819 |
825 |
820 \noindent |
826 \noindent |
821 holds. The reason for adding the @{text \<lambda>}-marker to our initial equational system is |
827 holds. Again note that the reason for adding the @{text \<lambda>}-marker to our initial equational system is |
822 to obtain this equation: it only holds with the marker, since none of |
828 to obtain this equation: it only holds with the marker, since none of |
823 the other terms contain the empty string. The point of the initial equational system is |
829 the other terms contain the empty string. |
824 that solving it means we will be able to extract a regular expression for every equivalence class. |
830 The point of the initial equational system is |
|
831 that solving it means we will be able to extract a regular expression for every equivalence class |
|
832 generated by the Myhill-Nerode relation. |
825 |
833 |
826 Our representation for the equations in Isabelle/HOL are pairs, |
834 Our representation for the equations in Isabelle/HOL are pairs, |
827 where the first component is an equivalence class (a set of strings) |
835 where the first component is an equivalence class (a set of strings) |
828 and the second component |
836 and the second component |
829 is a set of terms. Given a set of equivalence |
837 is a set of terms. Given a set of equivalence |
1956 \end{equation} |
1964 \end{equation} |
1957 |
1965 |
1958 \noindent |
1966 \noindent |
1959 Carrying this idea through, we must not consider the set of all derivatives, |
1967 Carrying this idea through, we must not consider the set of all derivatives, |
1960 but the one modulo @{text "ACI"}. In principle, this can be done formally, |
1968 but the one modulo @{text "ACI"}. In principle, this can be done formally, |
1961 but it is very painful in a theorem prover (since there is no |
1969 but it is very painful in a theorem prover---since there is no |
1962 direct characterisation of the set of dissimilar derivatives). |
1970 direct characterisation of the set of dissimilar derivatives. Therefore |
|
1971 \citeN{CoquandSiles12} who follow through this idea had to `stand on their heads' and |
|
1972 first formalise the quite intricate notion of \emph{inductively finite sets} in |
|
1973 order to formalise this property. |
1963 |
1974 |
1964 |
1975 |
1965 Fortunately, there is a much simpler approach using \emph{partial |
1976 Fortunately, there is a much simpler approach using \emph{partial |
1966 derivatives}. They were introduced by \citeN{Antimirov95} and can be defined |
1977 derivatives}. They were introduced by \citeN{Antimirov95} and can be defined |
1967 in Isabelle/HOL as follows: |
1978 in Isabelle/HOL as follows: |
2666 our formalisation \cite{myhillnerodeafp11} can be found in the Archive of Formal Proofs at |
2677 our formalisation \cite{myhillnerodeafp11} can be found in the Archive of Formal Proofs at |
2667 \mbox{\url{http://afp.sourceforge.net/entries/Myhill-Nerode.shtml}}. |
2678 \mbox{\url{http://afp.sourceforge.net/entries/Myhill-Nerode.shtml}}. |
2668 |
2679 |
2669 Our future work will focus on formalising the regular expression matchers |
2680 Our future work will focus on formalising the regular expression matchers |
2670 developed by \citeN{Sulzmann12} which generate variable assignments for |
2681 developed by \citeN{Sulzmann12} which generate variable assignments for |
2671 regular expression submatching.\smallskip |
2682 regular expression sub-matching. For this they use both, derivatives and |
|
2683 partial derivatives of regular expressions.\smallskip |
2672 |
2684 |
2673 \noindent |
2685 \noindent |
2674 {\bf Acknowledgements:} |
2686 {\bf Acknowledgements:} |
2675 We are grateful for the comments we received from Larry Paulson. Tobias |
2687 We are grateful for the comments we received from Larry Paulson. Tobias |
2676 Nipkow made us aware of the properties in Theorem~\ref{subseqreg} and Tjark |
2688 Nipkow made us aware of the properties in Theorem~\ref{subseqreg} and Tjark |