regexp: comparison Paper/Paper.thy

equal deleted inserted replaced

-:ec774952190c
+:c5eb5f3065ae
 Because of these problems to do with representing automata, there seems
 to be no substantial formalisation of automata theory and regular languages
 carried out in HOL-based theorem provers. Nipkow establishes in
 \cite{Nipkow98} the link between regular expressions and automata in
-the context of lexing. The only larger formalisations of automata theory
+the context of lexing. Berghofer and Reiter formalise automata working over
+bit strings in the context of Presburger arithmetic \cite{BerghoferReiter09}.
+The only larger formalisations of automata theory
 are carried out in Nuprl \cite{Constable00} and in Coq (for example
 \cite{Filliatre97}).
 In this paper, we will not attempt to formalise automata theory in
 Isabelle/HOL, but take a completely different approach to regular
 section {* Myhill-Nerode, Second Part *}
 text {*
-TO BE DONE
 \begin{theorem}
 Given @{text "r"} is a regular expressions, then @{thm Myhill_Nerode2}.
 \end{theorem}
-%  \begin{proof}
+\begin{proof}
-%  By induction on the structure of @{text r}. The cases for @{const NULL}, @{const EMPTY}
+By induction on the structure of @{text r}. The cases for @{const NULL}, @{const EMPTY}
-%  and @{const CHAR} are straightforward, because we can easily establish
+and @{const CHAR} are straightforward, because we can easily establish
-%  \begin{center}
+\begin{center}
-%  \begin{tabular}{l}
+\begin{tabular}{l}
-%  @{thm quot_null_eq}\\
+@{thm quot_null_eq}\\
-%  @{thm quot_empty_subset}\\
+@{thm quot_empty_subset}\\
-%  @{thm quot_char_subset}
+@{thm quot_char_subset}
-%  \end{tabular}
+\end{tabular}
-%  \end{center}
+\end{center}
-%
-%  \end{proof}
+\end{proof}
-%  @{thm tag_str_ALT_def[where ?L1.0="A" and ?L2.0="B"]}
+@{thm tag_str_ALT_def[where ?L1.0="A" and ?L2.0="B"]}
-%  @{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B"]}
+@{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B"]}
-%  @{thm tag_str_STAR_def[where ?L1.0="A"]}
+@{thm tag_str_STAR_def[where ?L1.0="A"]}
 *}
 section {* Conclusion and Related Work *}
 @{term "s\<^isub>1 \<approx>A s\<^isub>2"} if and only if @{term "s\<^isub>1 \<approx>(-A) s\<^isub>2"}
 \end{center}
 \noindent
 holds for any strings @{text "s\<^isub>1"} and @{text
-"s\<^isub>2"}. Therefore @{text A} and @{term "-A"} give rise to the same
+"s\<^isub>2"}. Therefore @{text A} and the complement language @{term "-A"} give rise to the same
-partitions.
+partitions.  Proving the existence of such a regular expression via automata would
-Proving the same result via automata would be quite involved. It includes the
+be quite involved. It includes the
 steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text
 "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"}
 regular expression.
 Our formalisation consists of ??? lines of Isabelle/Isar code for the first
 direction and ??? for the second. While this might be seen as too large to
 count as a concise proof pearl, this should be seen in the context of the
 work done by Constable at al \cite{Constable00} who formalised the
-Myhill-Nerode theorem in Nuprl using automata. This is the most
+Myhill-Nerode theorem in Nuprl using automata.
 They write that their
-four-member team needed something on the magnitute of 18 months to formalise
+four-member team needed something on the magnitute of 18 months for their
-the Myhill-Nerode theorem. The estimate for our formalisation is that we
+formalisation. The estimate for our formalisation is that we
 needed approximately 3 months and this included the time to find our proof
 arguments. Unlike Constable et al, who were able to follow the proofs from
-\cite{???}, we had to find our own arguments.  So for us the formalisation
+\cite{HopcroftUllman69}, we had to find our own arguments.  So for us the formalisation
 was not the bottleneck. It is hard to gauge the size of a formalisation in
 Nurpl, but from what is shown in the Nuprl Math Library about their
 development it seems substantially larger than ours. The code of
 ours can be found under
 Our proof of the first direction is very much inspired by \emph{Brzozowski's
 algebraic mehod} used to convert a finite automaton to a regular
 expression \cite{Brzozowski64}. The close connection can be seen by considering the equivalence
 classes as the states of the minimal automaton for the regular language.
-However there are some subtle differences. If we identify equivalence
+However there are some subtle differences. Since we identify equivalence
 classes with the states of the automaton, then the most natural choice is to
 characterise each state with the set of strings starting from the initial
 state leading up to that state. Usually, however, the states are characterised as the
 ones starting from that state leading to the terminal states.  The first
 choice has consequences how the initial equational system is set up. We have
 We briefly considered using the method Brzozowski presented in the Appendix
 of~\cite{Brzozowski64} in order to prove the second direction of the
 Myhill-Nerode theorem. There he calculates the derivatives for regular
 expressions and shows that there can be only finitely many of them. We could
-use as the tag of a string @{text s} the derivative of a regular expression
+have used as the tag of a string @{text s} the derivative of a regular expression
 generated with respect to @{text s}.  Using the fact that two strings are
 Myhill-Nerode related whenever their derivative is the same together with
 the fact that there are only finitely many derivatives for a regular
 expression would give us the same argument as ours. However it seems not so easy to
 calculate the derivatives and then to count them. Therefore we preferred our

changeset 114	c5eb5f3065ae
parent 113	ec774952190c
child 115	c5f138b5fc88