regexp: comparison Journal/Paper.thy

equal deleted inserted replaced

-:01d223421ba0
+:44c4450152e3
 The key definition in the Myhill-Nerode Theorem is the
 \emph{Myhill-Nerode Relation}, which states that w.r.t.~a language two
 strings are related, provided there is no distinguishing extension in this
 language. This can be defined as a ternary relation.
-\begin{dfntn}[(Myhill-Nerode Relation)]\label{myhillneroderel}
+\begin{dfntn}[Myhill-Nerode Relation]\label{myhillneroderel}
 Given a language @{text A}, two strings @{text x} and
 @{text y} are Myhill-Nerode related provided
 \begin{center}
 @{thm str_eq_def'}
 \end{center}
 \noindent
 The proof will be by induction on the structure of @{text r}. It turns out
 the base cases are straightforward.
-\begin{proof}[(Base Cases)]
+\begin{proof}[Base Cases]
 The cases for @{const ZERO}, @{const ONE} and @{const ATOM} are routine, because
 we can easily establish that
 \begin{center}
 \begin{tabular}{l}
 @{term "\<approx>(lang r)"}, which implies that @{term "UNIV // \<approx>(lang r)"} must
 also be finite.  We formally define the notion of a \emph{tagging-relation}
 as follows.
-\begin{dfntn}[(Tagging-Relation)] Given a tagging-function @{text tag}, then two strings @{text x}
+\begin{dfntn}[Tagging-Relation] Given a tagging-function @{text tag}, then two strings @{text x}
 and @{text y} are \emph{tag-related} provided
 \begin{center}
 @{text "x \<^raw:$\threesim$>\<^bsub>tag\<^esub> y \<equiv> tag x = tag y"}\;.
 \end{center}
 \end{dfntn}
 is that we need to establish that @{term "=(tag_Plus A B)="} refines @{term "\<approx>(A \<union> B)"}.
 This amounts to showing @{term "x \<approx>A y"} or @{term "x \<approx>B y"} under the assumption
 @{term "x"}~@{term "=(tag_Plus A B)="}~@{term y}. As we shall see, this definition will
 provide us with just the right assumptions in order to get the proof through.
-\begin{proof}[(@{const "PLUS"}-Case)]
+\begin{proof}[@{const "PLUS"}-Case]
 We can show in general, if @{term "finite (UNIV // \<approx>A)"} and @{term "finite
 (UNIV // \<approx>B)"} then @{term "finite ((UNIV // \<approx>A) \<times> (UNIV // \<approx>B))"}
 holds. The range of @{term "tag_Plus A B"} is a subset of this product
 set---so finite. For the refinement proof-obligation, we know that @{term
 "(\<approx>A `` {x}, \<approx>B `` {x}) = (\<approx>A `` {y}, \<approx>B `` {y})"} holds by assumption. Then
 Note that we have to make the assumption for all suffixes @{text "x\<^isub>s"}, since we do
 not know anything about how the string @{term x} is partitioned.
 With this definition in place, let us prove the @{const "Times"}-case.
-\begin{proof}[(@{const TIMES}-Case)]
+\begin{proof}[@{const TIMES}-Case]
 If @{term "finite (UNIV // \<approx>A)"} and @{term "finite (UNIV // \<approx>B)"}
 then @{term "finite ((UNIV // \<approx>A) \<times> (Pow (UNIV // \<approx>B)))"} holds. The range of
 @{term "tag_Times A B"} is a subset of this product set, and therefore finite.
 For the refinement of @{term "\<approx>(A \<cdot> B)"} and @{text "\<^raw:$\threesim$>\<^bsub>\<times>tag A B\<^esub>"},
 we have by Lemma \ref{refinement}
 \begin{center}
 @{thm (lhs) tag_Star_def[where ?A="A", THEN meta_eq_app]}~@{text "\<equiv>"}~
 @{text "{\<lbrakk>x\<^isub>s\<rbrakk>\<^bsub>\<approx>A\<^esub> | x\<^isub>p < x \<and> x\<^isub>p \<in> A\<^isup>\<star> \<and> (x\<^isub>p, x\<^isub>s) \<in> Partitions x}"}
 \end{center}
-\begin{proof}[(@{const Star}-Case)]
+\begin{proof}[@{const Star}-Case]
 If @{term "finite (UNIV // \<approx>A)"}
 then @{term "finite (Pow (UNIV // \<approx>A))"} holds. The range of
 @{term "tag_Star A"} is a subset of this set, and therefore finite.
 Again we have to show under the assumption @{term "x"}~@{term "=(tag_Star A)="}~@{term y}
 that @{term "x @ z \<in> A\<star>"} implies @{term "y @ z \<in> A\<star>"}.
 Finally we like to show that the Myhill-Nerode Theorem is also convenient for establishing
 the non-regularity of languages. For this we use the following version of the Continuation
 Lemma (see for example~\cite{Rosenberg06}).
-\begin{lmm}[(Continuation Lemma)]
+\begin{lmm}[Continuation Lemma]
 If a language @{text A} is regular and a set of strings @{text B} is infinite,
 then there exist two distinct strings @{text x} and @{text y} in @{text B}
 such that @{term "x \<approx>A y"}.
 \end{lmm}
 \end{lmm}
 \begin{proof}
 After unfolding the definition of @{text "B"}, we need to establish that given @{term "i \<noteq> j"},
 the strings @{text "a\<^sup>i"} and @{text "a\<^sup>j"} are not Myhill-Nerode related by @{text "A"}.
-That means we have to show that \mbox{@{text "\<forall>z. a\<^sup>i @ z \<in> A = a\<^sup>j @ z \<in> A"}} leads to
+That means we have to show that @{text "\<forall>z. a\<^sup>i @ z \<in> A = a\<^sup>j @ z \<in> A"} leads to
 a contradiction. Let us take @{text "b\<^sup>i"} for @{text "z"}. Then we know @{text "a\<^sup>i @ b\<^sup>i \<in> A"}.
 But since @{term "i \<noteq> j"}, @{text "a\<^sup>j @ b\<^sup>i \<notin> A"}. Therefore  @{text "a\<^sup>i"} and @{text "a\<^sup>j"}
 cannot be Myhill-Nerode related by @{text "A"}, and we are done.
 \end{proof}
 \noindent
 In this paper we took the view that a regular language is one where there
 exists a regular expression that matches all of its strings. Regular
 expressions can conveniently be defined as a datatype in theorem
 provers. For us it was therefore interesting to find out how far we can push
-this point of view. We have established in Isabelle/HOL both directions
+this point of view. But this question whether formal language theory can
+be done without automata crops up also in non-theorem prover contexts. For
+example, Gasarch asked in the Computational Complexity blog by \citeN{GasarchBlog}
+whether the complementation of
+regular-expression languages can be proved without using automata. He concluded
+\begin{quote}
+{\it \ldots it can't be done}
+\end{quote}
+\noindent
+and even pondered
+\begin{quote}
+{\it \ldots [b]ut is there a rigorous way to even state that?}
+\end{quote}
+\noindent
+We have established in Isabelle/HOL both directions
 of the Myhill-Nerode Theorem.
 %
-\begin{thrm}[(Myhill-Nerode Theorem)]\mbox{}\\
+\begin{thrm}[Myhill-Nerode Theorem]\mbox{}\\
 A language @{text A} is regular if and only if @{thm (rhs) Myhill_Nerode}.
 \end{thrm}
 \noindent
 Having formalised this theorem means we pushed our point of view quite
 far. Using this theorem we can obviously prove when a language is \emph{not}
 regular---by establishing that it has infinitely many equivalence classes
 generated by the Myhill-Nerode Relation (this is usually the purpose of the
 Pumping Lemma).  We can also use it to establish the standard
-textbook results about closure properties of regular languages. Interesting
+textbook results about closure properties of regular languages. The case of
-is the case of closure under complement, because it seems difficult to
+closure under complement follows easily from the Myhill-Nerode Theorem.
-construct a regular expression for the complement language by direct
+So our answer to Gasarch is ``{\it it can be done!''}
-means. However the existence of such a regular expression can be easily
-proved using the Myhill-Nerode Theorem.
 %Our insistence on regular expressions for proving the Myhill-Nerode Theorem
 %arose from the problem of defining formally the regularity of a language.
 %In order to guarantee consistency,
 %formalisations in HOL can only extend the logic with definitions that introduce a new concept in
 %regular expressions poses no problem at all for HOL.  We showed in this
 %paper that they can be used for establishing the central result in regular
 %language theory---the Myhill-Nerode Theorem.
 While regular expressions are convenient, they have some limitations. One is
+that there are some regular expressions for which the smallest regular
+expression for the complement language has a doubly-exponential blowup in size
+as shown by \citeN{Gelade12}.
+Another is
 that there seems to be no method of calculating a minimal regular expression
 (for example in terms of length) for a regular language, like there is for
 automata. On the other hand, efficient regular expression matching, without
 using automata, poses no problem as shown by \citeN{OwensReppyTuron09}.  For an
 implementation of a simple regular expression matcher, whose correctness has

changeset 375	44c4450152e3
parent 374	01d223421ba0
child 377	4f303da0cd2a