diff -r d65d071798ff -r 62fdb4bf7239 Paper/Paper.thy --- a/Paper/Paper.thy Thu Feb 17 13:25:29 2011 +0000 +++ b/Paper/Paper.thy Thu Feb 17 21:30:26 2011 +0000 @@ -33,8 +33,9 @@ transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longmapsto}}> _" [100, 100, 100] 100) and Setalt ("\<^raw:\ensuremath{\bigplus}>_" [1000] 999) and append_rexp2 ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 100) and - append_rhs_rexp ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 50) - + append_rhs_rexp ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 50) and + uminus ("\<^raw:\ensuremath{\overline{>_\<^raw:}}>" [100] 100) + (*>*) @@ -833,45 +834,85 @@ section {* Myhill-Nerode, Second Part *} text {* + TO BE DONE + \begin{theorem} - Given @{text "r"} is a regular expressions, then @{thm rexp_imp_finite}. + Given @{text "r"} is a regular expressions, then @{thm Myhill_Nerode2}. \end{theorem} - \begin{proof} - By induction on the structure of @{text r}. The cases for @{const NULL}, @{const EMPTY} - and @{const CHAR} are straightforward, because we can easily establish +% \begin{proof} +% By induction on the structure of @{text r}. The cases for @{const NULL}, @{const EMPTY} +% and @{const CHAR} are straightforward, because we can easily establish - \begin{center} - \begin{tabular}{l} - @{thm quot_null_eq}\\ - @{thm quot_empty_subset}\\ - @{thm quot_char_subset} - \end{tabular} - \end{center} - - \end{proof} +% \begin{center} +% \begin{tabular}{l} +% @{thm quot_null_eq}\\ +% @{thm quot_empty_subset}\\ +% @{thm quot_char_subset} +% \end{tabular} +% \end{center} +% +% \end{proof} - @{thm tag_str_ALT_def[where ?L1.0="A" and ?L2.0="B"]} +% @{thm tag_str_ALT_def[where ?L1.0="A" and ?L2.0="B"]} - @{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B"]} +% @{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B"]} - @{thm tag_str_STAR_def[where ?L1.0="A"]} +% @{thm tag_str_STAR_def[where ?L1.0="A"]} *} section {* Conclusion and Related Work *} text {* - In this paper we took the view that a regular language is one where there exists - a regular expression that matches all its strings. For us it was ineteresting to find - out how far we can push this point of view. Having formalised the Myhill-Nerode - theorem means pushed quite far. Having the Myhill-Nerode theorem means we can - formalise much of the textbook results in this subject. + In this paper we took the view that a regular language is one where there + exists a regular expression that matches all its strings. Regular + expressions can be conveniently defined as a datatype in a HOL-based theorem + prover. For us it was therefore interesting to find out how far we can push + this point of view. + + Having formalised the Myhill-Nerode theorem means we + pushed quite far. Using this theorem we can obviously prove when a language + is \emph{not} regular---by establishing that it has infinitely many + equivalence classes generated by the Myhill-Nerode relation (this is usually + the purpose of the pumping lemma \cite{Kozen97}). We can also use it to + establish the standard textbook results about closure properties of regular + languages. Interesting is the case of closure under complement, because + it seems difficult to construct a regular expression for the complement + language by direct means. However the existence can be easily proved using + the Myhill-Nerode theorem since clearly - Our proof of the first direction is very much inspired by \emph{Brz - algebraic mehod} used to convert a finite atomaton to a regular + \begin{center} + @{term "s\<^isub>1 \A s\<^isub>2"} if and only if @{term "s\<^isub>1 \(-A) s\<^isub>2"} + \end{center} + + \noindent + holds for any strings @{text "s\<^isub>1"} and @{text + "s\<^isub>2"}. Therefore @{text A} and @{term "-A"} give rise to the same + partitions. From the closure under complementation follows also the closure + under intersection and set difference by some simple set calculations. + Proving the same result via automata would be quite involved. It includes the + steps: regular expression @{text "\"} non-deterministic automaton @{text + "\"} deterministic automaton @{text "\"} complement automaton @{text "\"} + regular expression. + + Our formalisation consists of ??? lines of Isar code for the first + direction and ??? for the second. While this might be seen as too large + to count as a concise proof pearl, this should be seen in the context + of the work done by Constable at al \cite{Constable00} who formalised + the Myhill-Nerode theorem in Nuprl using automata. They write that + their four-member team needed something on the magnitute of 18 months + to formalise the Myhill-Nerode theorem. Our estimate is that we needed + approximately 3 months for our fomalisation and this included the time + to find our proof arguments, as we could not find them in the literature. + So for us the formalisation was not the bottleneck. It is hard for us + to gauge the size of a formalisation in Nurpl, but from what is shown in + the Nuprl Math Library their development seems substantially larger. + + Our proof of the first direction is very much inspired by \emph{Brzozowski's + algebraic mehod} used to convert a finite automaton to a regular expression. The close connection can be seen by considering the equivalence classes as the states of the minimal automaton for the regular language. However there are some subtle differences. If we identify equivalence @@ -880,30 +921,30 @@ state leading up to that state. Usually the states are characterised as the ones starting from that state leading to the terminal states. The first choice has consequences how the initial equational system is set up. We have - the $\lambda$-term on our ``initial state'', while Brz has it on the + the $\lambda$-term on our ``initial state'', while Brzozowski has it on the terminal states. This means we also need to reverse the direction of Arden's lemma. - We briefly considered using the method Brz presented in the Appendix of ??? - in order to prove the second direction of the Myhill-Nerode thereom. There - he calculates the derivatives for regular expressions and shows that there - can be only finitely many of them. We could use as the tag of a string - @{text s} the derivative of a regular expression generated with respect to - @{text s}. Using the fact that two strings are Myhill-Nerode related - whenever their derivative is the same together with the fact that there are - only finitely many derivatives for a regular expression would give us the - same argument. However it seems not so easy to calculate the derivatives - and then to count them. Therefore we preferred our direct method of - using tagging-functions involving equivalence classes. This is also where - our method shines, because we can completely side-step the standard - argument \cite{Kozen97} where automata need to be composed, which is not so - convenient to formalise in a HOL-based theorem prover. - + We briefly considered using the method Brzozowski presented in the Appendix + of \cite{Brzozowski64} in order to prove the second direction of the + Myhill-Nerode theorem. There he calculates the derivatives for regular + expressions and shows that there can be only finitely many of them. We could + use as the tag of a string @{text s} the derivative of a regular expression + generated with respect to @{text s}. Using the fact that two strings are + Myhill-Nerode related whenever their derivative is the same together with + the fact that there are only finitely many derivatives for a regular + expression would give us the same argument. However it seems not so easy to + calculate the derivatives and then to count them. Therefore we preferred our + direct method of using tagging-functions involving equivalence classes. This + is also where our method shines, because we can completely side-step the + standard argument \cite{Kozen97} where automata need to be composed, which + is not so convenient to formalise in a HOL-based theorem prover. - Lines of code / nuprl + While regular expressions are convenient in formalisations, they have some + limitations. One is that there seems to be no notion of a minimal regular + expression, like there is a notion of a minimal automaton for a regular + expression. - closure properties - *}