regexp: comparison Paper/Paper.thy

equal deleted inserted replaced

-:d65d071798ff
+:62fdb4bf7239
 Trn ("'(_, _')" [100, 100] 100) and
 EClass ("\<lbrakk>_\<rbrakk>\<^bsub>_\<^esub>" [100, 100] 100) and
 transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longmapsto}}> _" [100, 100, 100] 100) and
 Setalt ("\<^raw:\ensuremath{\bigplus}>_" [1000] 999) and
 append_rexp2 ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 100) and
-append_rhs_rexp ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 50)
+append_rhs_rexp ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 50) and
+uminus ("\<^raw:\ensuremath{\overline{>_\<^raw:}}>" [100] 100)
 (*>*)
 section {* Introduction *}
 section {* Myhill-Nerode, Second Part *}
 text {*
+TO BE DONE
 \begin{theorem}
-Given @{text "r"} is a regular expressions, then @{thm rexp_imp_finite}.
+Given @{text "r"} is a regular expressions, then @{thm Myhill_Nerode2}.
 \end{theorem}
-\begin{proof}
+%  \begin{proof}
-By induction on the structure of @{text r}. The cases for @{const NULL}, @{const EMPTY}
+%  By induction on the structure of @{text r}. The cases for @{const NULL}, @{const EMPTY}
-and @{const CHAR} are straightforward, because we can easily establish
+%  and @{const CHAR} are straightforward, because we can easily establish
-\begin{center}
+%  \begin{center}
-\begin{tabular}{l}
+%  \begin{tabular}{l}
-@{thm quot_null_eq}\\
+%  @{thm quot_null_eq}\\
-@{thm quot_empty_subset}\\
+%  @{thm quot_empty_subset}\\
-@{thm quot_char_subset}
+%  @{thm quot_char_subset}
-\end{tabular}
+%  \end{tabular}
-\end{center}
+%  \end{center}
+%
-\end{proof}
+%  \end{proof}
-@{thm tag_str_ALT_def[where ?L1.0="A" and ?L2.0="B"]}
+%  @{thm tag_str_ALT_def[where ?L1.0="A" and ?L2.0="B"]}
-@{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B"]}
+%  @{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B"]}
-@{thm tag_str_STAR_def[where ?L1.0="A"]}
+%  @{thm tag_str_STAR_def[where ?L1.0="A"]}
 *}
 section {* Conclusion and Related Work *}
 text {*
-In this paper we took the view that a regular language is one where there exists
+In this paper we took the view that a regular language is one where there
-a regular expression that matches all its strings. For us it was ineteresting to find
+exists a regular expression that matches all its strings. Regular
-out how far we can push this point of view. Having formalised the Myhill-Nerode
+expressions can be conveniently defined as a datatype in a HOL-based theorem
-theorem means pushed quite far. Having the Myhill-Nerode theorem means we can
+prover. For us it was therefore interesting to find out how far we can push
-formalise much of the textbook results in this subject.
+this point of view.
-Our proof of the first direction is very much inspired by \emph{Brz
+Having formalised the Myhill-Nerode theorem means we
-algebraic mehod} used to convert a finite atomaton to a regular
+pushed quite far. Using this theorem we can obviously prove when a language
+is \emph{not} regular---by establishing that it has infinitely many
+equivalence classes generated by the Myhill-Nerode relation (this is usually
+the purpose of the pumping lemma \cite{Kozen97}).  We can also use it to
+establish the standard textbook results about closure properties of regular
+languages. Interesting is the case of closure under complement, because
+it seems difficult to construct a regular expression for the complement
+language by direct means. However the existence can be easily proved using
+the Myhill-Nerode theorem since clearly
+\begin{center}
+@{term "s\<^isub>1 \<approx>A s\<^isub>2"} if and only if @{term "s\<^isub>1 \<approx>(-A) s\<^isub>2"}
+\end{center}
+\noindent
+holds for any strings @{text "s\<^isub>1"} and @{text
+"s\<^isub>2"}. Therefore @{text A} and @{term "-A"} give rise to the same
+partitions.  From the closure under complementation follows also the closure
+under intersection and set difference by some simple set calculations.
+Proving the same result via automata would be quite involved. It includes the
+steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text
+"\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"}
+regular expression.
+Our formalisation consists of ??? lines of Isar code for the first
+direction and ??? for the second. While this might be seen as too large
+to count as a concise proof pearl, this should be seen in the context
+of the work done by Constable at al \cite{Constable00} who formalised
+the Myhill-Nerode theorem in Nuprl using automata. They write that
+their four-member team needed something on the magnitute of 18 months
+to formalise the Myhill-Nerode theorem. Our estimate is that we needed
+approximately 3 months for our fomalisation and this included the time
+to find our proof arguments, as we could not find them in the literature.
+So for us the formalisation was not the bottleneck. It is hard for us
+to gauge the size of a formalisation in Nurpl, but from what is shown in
+the Nuprl Math Library their development seems substantially larger.
+Our proof of the first direction is very much inspired by \emph{Brzozowski's
+algebraic mehod} used to convert a finite automaton to a regular
 expression. The close connection can be seen by considering the equivalence
 classes as the states of the minimal automaton for the regular language.
 However there are some subtle differences. If we identify equivalence
 classes with the states of the automaton, then the most natural choice is to
 characterise each state with the set of strings starting from the initial
 state leading up to that state. Usually the states are characterised as the
 ones starting from that state leading to the terminal states.  The first
 choice has consequences how the initial equational system is set up. We have
-the $\lambda$-term on our ``initial state'', while Brz has it on the
+the $\lambda$-term on our ``initial state'', while Brzozowski has it on the
 terminal states. This means we also need to reverse the direction of Arden's
 lemma.
-We briefly considered using the method Brz presented in the Appendix of ???
+We briefly considered using the method Brzozowski presented in the Appendix
-in order to prove the second direction of the Myhill-Nerode thereom. There
+of \cite{Brzozowski64} in order to prove the second direction of the
-he calculates the derivatives for regular expressions and shows that there
+Myhill-Nerode theorem. There he calculates the derivatives for regular
-can be only finitely many of them. We could use as the tag of a string
+expressions and shows that there can be only finitely many of them. We could
-@{text s} the derivative of a regular expression generated with respect to
+use as the tag of a string @{text s} the derivative of a regular expression
-@{text s}.  Using the fact that two strings are Myhill-Nerode related
+generated with respect to @{text s}.  Using the fact that two strings are
-whenever their derivative is the same together with the fact that there are
+Myhill-Nerode related whenever their derivative is the same together with
-only finitely many derivatives for a regular expression would give us the
+the fact that there are only finitely many derivatives for a regular
-same argument. However it seems not so easy to calculate the derivatives
+expression would give us the same argument. However it seems not so easy to
-and then to count them. Therefore we preferred our direct method of
+calculate the derivatives and then to count them. Therefore we preferred our
-using tagging-functions involving equivalence classes. This is also where
+direct method of using tagging-functions involving equivalence classes. This
-our method shines, because we can completely side-step the standard
+is also where our method shines, because we can completely side-step the
-argument \cite{Kozen97} where automata need to be composed, which is not so
+standard argument \cite{Kozen97} where automata need to be composed, which
-convenient to formalise in a HOL-based theorem prover.
+is not so convenient to formalise in a HOL-based theorem prover.
+While regular expressions are convenient in formalisations, they have some
-Lines of code / nuprl
+limitations. One is that there seems to be no notion of a minimal regular
+expression, like there is a notion of a minimal automaton for a regular
-closure properties
+expression.
 *}
 (*<*)
 end

changeset 112	62fdb4bf7239
parent 111	d65d071798ff
child 113	ec774952190c