Paper/Paper.thy
changeset 112 62fdb4bf7239
parent 111 d65d071798ff
child 113 ec774952190c
equal deleted inserted replaced
111:d65d071798ff 112:62fdb4bf7239
    31   Trn ("'(_, _')" [100, 100] 100) and 
    31   Trn ("'(_, _')" [100, 100] 100) and 
    32   EClass ("\<lbrakk>_\<rbrakk>\<^bsub>_\<^esub>" [100, 100] 100) and
    32   EClass ("\<lbrakk>_\<rbrakk>\<^bsub>_\<^esub>" [100, 100] 100) and
    33   transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longmapsto}}> _" [100, 100, 100] 100) and
    33   transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longmapsto}}> _" [100, 100, 100] 100) and
    34   Setalt ("\<^raw:\ensuremath{\bigplus}>_" [1000] 999) and
    34   Setalt ("\<^raw:\ensuremath{\bigplus}>_" [1000] 999) and
    35   append_rexp2 ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 100) and
    35   append_rexp2 ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 100) and
    36   append_rhs_rexp ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 50)
    36   append_rhs_rexp ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 50) and
    37 
    37   uminus ("\<^raw:\ensuremath{\overline{>_\<^raw:}}>" [100] 100)
       
    38   
    38 (*>*)
    39 (*>*)
    39 
    40 
    40 
    41 
    41 section {* Introduction *}
    42 section {* Introduction *}
    42 
    43 
   831 
   832 
   832 
   833 
   833 section {* Myhill-Nerode, Second Part *}
   834 section {* Myhill-Nerode, Second Part *}
   834 
   835 
   835 text {*
   836 text {*
       
   837   TO BE DONE
       
   838 
   836 
   839 
   837   \begin{theorem}
   840   \begin{theorem}
   838   Given @{text "r"} is a regular expressions, then @{thm rexp_imp_finite}.
   841   Given @{text "r"} is a regular expressions, then @{thm Myhill_Nerode2}.
   839   \end{theorem}  
   842   \end{theorem}  
   840 
   843 
   841   \begin{proof}
   844 %  \begin{proof}
   842   By induction on the structure of @{text r}. The cases for @{const NULL}, @{const EMPTY}
   845 %  By induction on the structure of @{text r}. The cases for @{const NULL}, @{const EMPTY}
   843   and @{const CHAR} are straightforward, because we can easily establish
   846 %  and @{const CHAR} are straightforward, because we can easily establish
   844 
   847 
   845   \begin{center}
   848 %  \begin{center}
   846   \begin{tabular}{l}
   849 %  \begin{tabular}{l}
   847   @{thm quot_null_eq}\\
   850 %  @{thm quot_null_eq}\\
   848   @{thm quot_empty_subset}\\
   851 %  @{thm quot_empty_subset}\\
   849   @{thm quot_char_subset}
   852 %  @{thm quot_char_subset}
   850   \end{tabular}
   853 %  \end{tabular}
   851   \end{center}
   854 %  \end{center}
   852 
   855 %
   853   \end{proof}
   856 %  \end{proof}
   854 
   857 
   855 
   858 
   856   @{thm tag_str_ALT_def[where ?L1.0="A" and ?L2.0="B"]}
   859 %  @{thm tag_str_ALT_def[where ?L1.0="A" and ?L2.0="B"]}
   857 
   860 
   858   @{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B"]}
   861 %  @{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B"]}
   859  
   862  
   860   @{thm tag_str_STAR_def[where ?L1.0="A"]}
   863 %  @{thm tag_str_STAR_def[where ?L1.0="A"]}
   861 *}
   864 *}
   862 
   865 
   863 
   866 
   864 section {* Conclusion and Related Work *}
   867 section {* Conclusion and Related Work *}
   865 
   868 
   866 text {*
   869 text {*
   867   In this paper we took the view that a regular language is one where there exists 
   870   In this paper we took the view that a regular language is one where there
   868   a regular expression that matches all its strings. For us it was ineteresting to find 
   871   exists a regular expression that matches all its strings. Regular
   869   out how far we can push this point of view. Having formalised the Myhill-Nerode
   872   expressions can be conveniently defined as a datatype in a HOL-based theorem
   870   theorem means pushed quite far. Having the Myhill-Nerode theorem means we can 
   873   prover. For us it was therefore interesting to find out how far we can push
   871   formalise much of the textbook results in this subject. 
   874   this point of view. 
   872 
   875 
   873   Our proof of the first direction is very much inspired by \emph{Brz
   876   Having formalised the Myhill-Nerode theorem means we
   874   algebraic mehod} used to convert a finite atomaton to a regular
   877   pushed quite far. Using this theorem we can obviously prove when a language
       
   878   is \emph{not} regular---by establishing that it has infinitely many
       
   879   equivalence classes generated by the Myhill-Nerode relation (this is usually
       
   880   the purpose of the pumping lemma \cite{Kozen97}).  We can also use it to
       
   881   establish the standard textbook results about closure properties of regular
       
   882   languages. Interesting is the case of closure under complement, because
       
   883   it seems difficult to construct a regular expression for the complement
       
   884   language by direct means. However the existence can be easily proved using
       
   885   the Myhill-Nerode theorem since clearly
       
   886 
       
   887   \begin{center}
       
   888   @{term "s\<^isub>1 \<approx>A s\<^isub>2"} if and only if @{term "s\<^isub>1 \<approx>(-A) s\<^isub>2"}
       
   889   \end{center}
       
   890  
       
   891   \noindent
       
   892   holds for any strings @{text "s\<^isub>1"} and @{text
       
   893   "s\<^isub>2"}. Therefore @{text A} and @{term "-A"} give rise to the same
       
   894   partitions.  From the closure under complementation follows also the closure
       
   895   under intersection and set difference by some simple set calculations.
       
   896   Proving the same result via automata would be quite involved. It includes the
       
   897   steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text
       
   898   "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"}
       
   899   regular expression.
       
   900 
       
   901   Our formalisation consists of ??? lines of Isar code for the first
       
   902   direction and ??? for the second. While this might be seen as too large
       
   903   to count as a concise proof pearl, this should be seen in the context 
       
   904   of the work done by Constable at al \cite{Constable00} who formalised
       
   905   the Myhill-Nerode theorem in Nuprl using automata. They write that 
       
   906   their four-member team needed something on the magnitute of 18 months 
       
   907   to formalise the Myhill-Nerode theorem. Our estimate is that we needed
       
   908   approximately 3 months for our fomalisation and this included the time
       
   909   to find our proof arguments, as we could not find them in the literature.
       
   910   So for us the formalisation was not the bottleneck. It is hard for us 
       
   911   to gauge the size of a formalisation in Nurpl, but from what is shown in
       
   912   the Nuprl Math Library their development seems substantially larger.
       
   913 
       
   914   Our proof of the first direction is very much inspired by \emph{Brzozowski's
       
   915   algebraic mehod} used to convert a finite automaton to a regular
   875   expression. The close connection can be seen by considering the equivalence
   916   expression. The close connection can be seen by considering the equivalence
   876   classes as the states of the minimal automaton for the regular language.
   917   classes as the states of the minimal automaton for the regular language.
   877   However there are some subtle differences. If we identify equivalence
   918   However there are some subtle differences. If we identify equivalence
   878   classes with the states of the automaton, then the most natural choice is to
   919   classes with the states of the automaton, then the most natural choice is to
   879   characterise each state with the set of strings starting from the initial
   920   characterise each state with the set of strings starting from the initial
   880   state leading up to that state. Usually the states are characterised as the
   921   state leading up to that state. Usually the states are characterised as the
   881   ones starting from that state leading to the terminal states.  The first
   922   ones starting from that state leading to the terminal states.  The first
   882   choice has consequences how the initial equational system is set up. We have
   923   choice has consequences how the initial equational system is set up. We have
   883   the $\lambda$-term on our ``initial state'', while Brz has it on the
   924   the $\lambda$-term on our ``initial state'', while Brzozowski has it on the
   884   terminal states. This means we also need to reverse the direction of Arden's
   925   terminal states. This means we also need to reverse the direction of Arden's
   885   lemma.
   926   lemma.
   886 
   927 
   887   We briefly considered using the method Brz presented in the Appendix of ???
   928   We briefly considered using the method Brzozowski presented in the Appendix
   888   in order to prove the second direction of the Myhill-Nerode thereom. There
   929   of \cite{Brzozowski64} in order to prove the second direction of the
   889   he calculates the derivatives for regular expressions and shows that there
   930   Myhill-Nerode theorem. There he calculates the derivatives for regular
   890   can be only finitely many of them. We could use as the tag of a string
   931   expressions and shows that there can be only finitely many of them. We could
   891   @{text s} the derivative of a regular expression generated with respect to
   932   use as the tag of a string @{text s} the derivative of a regular expression
   892   @{text s}.  Using the fact that two strings are Myhill-Nerode related
   933   generated with respect to @{text s}.  Using the fact that two strings are
   893   whenever their derivative is the same together with the fact that there are
   934   Myhill-Nerode related whenever their derivative is the same together with
   894   only finitely many derivatives for a regular expression would give us the
   935   the fact that there are only finitely many derivatives for a regular
   895   same argument. However it seems not so easy to calculate the derivatives
   936   expression would give us the same argument. However it seems not so easy to
   896   and then to count them. Therefore we preferred our direct method of
   937   calculate the derivatives and then to count them. Therefore we preferred our
   897   using tagging-functions involving equivalence classes. This is also where
   938   direct method of using tagging-functions involving equivalence classes. This
   898   our method shines, because we can completely side-step the standard 
   939   is also where our method shines, because we can completely side-step the
   899   argument \cite{Kozen97} where automata need to be composed, which is not so 
   940   standard argument \cite{Kozen97} where automata need to be composed, which
   900   convenient to formalise in a HOL-based theorem prover.
   941   is not so convenient to formalise in a HOL-based theorem prover.
   901   
   942 
   902 
   943   While regular expressions are convenient in formalisations, they have some
   903   Lines of code / nuprl
   944   limitations. One is that there seems to be no notion of a minimal regular
   904 
   945   expression, like there is a notion of a minimal automaton for a regular
   905   closure properties
   946   expression.
   906   
   947 
   907 *}
   948 *}
   908 
   949 
   909 
   950 
   910 (*<*)
   951 (*<*)
   911 end
   952 end