regexp: changeset 99:54aa3b6dd71c

Binary file Literature/rutten.pdf has changed

--- a/Myhill.thy	Fri Feb 11 13:30:37 2011 +0000
+++ b/Myhill.thy	Sun Feb 13 10:36:53 2011 +0000
@@ -2,7 +2,7 @@
   imports Myhill_2
 begin
 
-section {* Preliminaries *}
+section {* Preliminaries \label{sec_prelim}*}
 
 subsection {* Finite automata and \mht  \label{sec_fa_mh} *}
 
@@ -200,23 +200,132 @@
   ones. For example, the recursive equation \eqref{x_1_def_o} is transformed into the follwing
   non-recursive one:
   \begin{equation}
-     X_1  =  (X_0 \cdot a + X_2 \cdot d) \cdot b^*
+     X_1  =  (X_0 \cdot a + X_2 \cdot d) \cdot b^* = X_0 \cdot (a \cdot b^*) + X_2 \cdot (d \cdot b^*)
   \end{equation}
-  which, by Arden's lemma, still characterizes $X_1$ correctly. By subsitute 
-  $(X_0 \cdot a + X_2 \cdot d) \cdot b^*$ for all $X_1$ and remove  \eqref{x_1_def_o},
+  which, by Arden's lemma, still characterizes $X_1$ correctly. By subsituting
+  $(X_0 \cdot a + X_2 \cdot d) \cdot b^*$ for all $X_1$ and removing  \eqref{x_1_def_o},
   we get:
-
   \begin{subequations}
   \begin{eqnarray}
-    X_0 & = & ((X_0 \cdot a + X_2 \cdot d) \cdot b^*) \cdot c + X_2 \cdot d + \lambda \label{x_0_def_1} \\
-    X_2 & = & X_0 \cdot b + ((X_0 \cdot a + X_2 \cdot d) \cdot b^*) \cdot d + X_2 \cdot a \\
+    X_0 & = & \begin{aligned}
+              & (X_0 \cdot (a \cdot b^*) + X_2 \cdot (d \cdot b^*)) \cdot c + 
+                X_2 \cdot d + \lambda = \\
+              & X_0 \cdot (a \cdot b^* \cdot c) + X_2 \cdot (d \cdot b^* \cdot c) + 
+                X_2 \cdot d + \lambda = \\
+              &  X_0 \cdot (a \cdot b^* \cdot c) + X_2 \cdot (d \cdot b^* \cdot c + d) + \lambda 
+              \end{aligned} \label{x_0_def_1} \\
+    X_2 & = & \begin{aligned}
+                 & X_0 \cdot b + (X_0 \cdot (a \cdot b^*) + X_2 \cdot (d \cdot b^*)) \cdot d + X_2 \cdot a = \\
+                 & X_0 \cdot b + X_0 \cdot (a \cdot b^* \cdot d) + X_2 \cdot (d \cdot b^* \cdot d) + X_2 \cdot a = \\
+                 & X_0 \cdot (b + a \cdot b^* \cdot d) + X_2 \cdot (d \cdot b^* \cdot d + a)
+              \end{aligned} \\
     X_3 & = & \begin{aligned}
                  & X_0 \cdot (c + d + e) + ((X_0 \cdot a + X_2 \cdot d) \cdot b^*) \cdot (a + e)\\
-                 & + X_2 \cdot (b + e) + X_3 \cdot (a + b + c + d + e)
+                 & + X_2 \cdot (b + e) + X_3 \cdot (a + b + c + d + e) \label{x_3_def_1}
               \end{aligned}
   \end{eqnarray}
   \end{subequations}  
- 
+Suppose $X_3$ is the one to remove next, since $X_3$ dose not appear in either $X_0$ or $X_2$, 
+the removal of equation \eqref{x_3_def_1} changes nothing in the rest equations. Therefore, we get:
+ \begin{subequations}
+  \begin{eqnarray}
+    X_0 & = & X_0 \cdot (a \cdot b^* \cdot c) + X_2 \cdot (d \cdot b^* \cdot c + d) + \lambda  \label{x_0_def_2} \\
+    X_2 & = & X_0 \cdot (b + a \cdot b^* \cdot d) + X_2 \cdot (d \cdot b^* \cdot d + a) \label{x_2_def_2}
+  \end{eqnarray}
+  \end{subequations}  
+Actually, since absorbing state like $X_3$ contributes nothing to the language $\Lang$, it could have been removed
+at the very beginning of this precedure without affecting the final result. Now, the last unknown to remove 
+is $X_2$ and the Arden's transformaton of \eqref{x_2_def_2} is:
+\begin{equation} \label{x_2_ardened}
+   X_2 ~ = ~  (X_0 \cdot (b + a \cdot b^* \cdot d)) \cdot (d \cdot b^* \cdot d + a)^* =
+              X_0 \cdot ((b + a \cdot b^* \cdot d) \cdot (d \cdot b^* \cdot d + a)^*)
+\end{equation}
+By substituting the right hand side of \eqref{x_2_ardened} into \eqref{x_0_def_2}, we get:
+\begin{equation}
+\begin{aligned}
+  X_0  & = && X_0 \cdot (a \cdot b^* \cdot c) + \\
+       &   && X_0 \cdot ((b + a \cdot b^* \cdot d) \cdot (d \cdot b^* \cdot d + a)^*) \cdot 
+           (d \cdot b^* \cdot c + d) + \lambda \\
+       & =  && X_0 \cdot ((a \cdot b^* \cdot c) + \\
+       &   && \hspace{3em} ((b + a \cdot b^* \cdot d) \cdot (d \cdot b^* \cdot d + a)^*) \cdot 
+           (d \cdot b^* \cdot c + d)) + \lambda 
+\end{aligned}
+\end{equation}
+By applying Arden's transformation to this, we get the solution of $X_0$ as:
+\begin{equation}
+\begin{aligned}
+  X_0  =  ((a \cdot b^* \cdot c) + 
+                ((b + a \cdot b^* \cdot d) \cdot (d \cdot b^* \cdot d + a)^*) \cdot 
+                        (d \cdot b^* \cdot c + d))^*
+\end{aligned}
+\end{equation}
+Using the same method, solutions for $X_1$ and $X_2$ can be obtained as well and the
+regular expressoin for $\Lang$ is just $X_1 + X_2$. The formalization of this procedure
+consititues the first direction of the {\em regular expression} verion of
+\mht. Detailed explaination are given in {\bf paper.pdf} and more details and comments
+can be found in the formal scripts.
 *}
 
+section {* Direction @{text "finite partition \<Rightarrow> regular language"} *}
+
+text {*
+  It is well known in the theory of regular languages that
+  the existence of finite  language partition amounts to the existence of 
+  a minimal automaton, i.e. the $M_\Lang$ constructed in section \ref{sec_prelim}, which recoginzes the 
+  same language $\Lang$. The standard way to prove the existence of finite language partition 
+  is to construct a automaton out of the regular expression which recoginzes the same language, from
+  which the existence of finite language partition follows immediately. As discussed in the introducton of 
+  {\bf paper.pdf} as well as in [5], the problem for this approach happens when automata 
+  of sub regular expressions are combined to form the automaton of the mother regular expression, 
+  no matter what kind of representation is used, the formalization is cubersome, as said 
+  by Nipkow in [5]: `{\em a more abstract mathod is clearly desirable}'. In this section, 
+  an {\em intrinsically abstract} method is given, which completely avoid the mentioning of automata,
+  let along any particular representations.
+  *}
+
+text {*
+  The main proof structure is a structural induction on regular expressions,
+  where base cases (cases for @{const "NULL"}, @{const "EMPTY"}, @{const "CHAR"}) are quite straightforward to
+  proof. Real difficulty lies in inductive cases.  By inductive hypothesis, languages defined by
+  sub-expressions induce finite partitiions. Under such hypothsis, we need to prove that the language 
+  defined by the composite regular expression gives rise to finite partion.  
+  The basic idea is to attach a tag @{text "tag(x)"} to every string 
+  @{text "x"}. The tagging fuction @{text "tag"} is carefully devised, which returns tags 
+  made of equivalent classes of the partitions induced by subexpressoins, and therefore has a finite
+  range. Let @{text "Lang"} be the composite language, it is proved that:
+  \begin{quote}
+  If strings with the same tag are equivalent with respect to @{text "Lang"}, expressed as:
+  \[
+  @{text "tag(x) = tag(y) \<Longrightarrow> x \<approx>Lang y"}
+  \]
+  then the partition induced by @{text "Lang"} must be finite.
+  \end{quote}
+  There are two arguments for this. The first goes as the following:
+  \begin{enumerate}
+    \item First, the tagging function @{text "tag"} induces an equivalent relation @{text "(=tag=)"} 
+          (defiintion of @{text "f_eq_rel"} and lemma @{text "equiv_f_eq_rel"}).
+    \item It is shown that: if the range of @{text "tag"} (denoted @{text "range(tag)"}) is finite, 
+           the partition given rise by @{text "(=tag=)"} is finite (lemma @{text "finite_eq_f_rel"}).
+           Since tags are made from equivalent classes from component partitions, and the inductive
+           hypothesis ensures the finiteness of these partitions, it is not difficult to prove
+           the finiteness of @{text "range(tag)"}.
+    \item It is proved that if equivalent relation @{text "R1"} is more refined than @{text "R2"}
+           (expressed as @{text "R1 \<subseteq> R2"}),
+           and the partition induced by @{text "R1"} is finite, then the partition induced by @{text "R2"}
+           is finite as well (lemma @{text "refined_partition_finite"}).
+    \item The injectivity assumption @{text "tag(x) = tag(y) \<Longrightarrow> x \<approx>Lang y"} implies that
+            @{text "(=tag=)"} is more refined than @{text "(\<approx>Lang)"}.
+    \item Combining the points above, we have: the partition induced by language @{text "Lang"}
+          is finite (lemma @{text "tag_finite_imageD"}).
+  \end{enumerate}
+
+We could have followed another approach given in  appendix II of Brzozowski's paper [?], where
+the set of derivatives of any regular expression can be proved to be finite. 
+Since it is easy to show that strings with same derivative are equivalent with respect to the 
+language, then the second direction follows. We believe that our
+apporoach is easy to formalize, with no need to do complicated derivation calculations
+and countings as in [???].
+*}
+
+
 end

--- a/Myhill_1.thy	Fri Feb 11 13:30:37 2011 +0000
+++ b/Myhill_1.thy	Sun Feb 13 10:36:53 2011 +0000
@@ -184,10 +184,8 @@
 qed
 
 
-
 section {* A modified version of Arden's lemma *}
 
-
 text {*  A helper lemma for Arden *}
 
 lemma arden_helper:
@@ -306,7 +304,6 @@
 done
 
 
-
 section {* Direction @{text "finite partition \<Rightarrow> regular language"} *}
 
 
@@ -349,10 +346,8 @@
 unfolding finals_def quotient_def
 by auto
 
-
 section {* Equational systems *}
 
-
 text {* The two kinds of terms in the rhs of equations. *}
 
 datatype rhs_item = 
@@ -499,8 +494,6 @@
 where
   "solve X ES = (if (card ES = 1) then ES else solve X (Iter X ES))"
 
-thm solve.simps
-
 
 text {*
   Since the @{text "while"} combinator from HOL library is used to implement @{text "Solve X ES"},

--- a/Myhill_2.thy	Fri Feb 11 13:30:37 2011 +0000
+++ b/Myhill_2.thy	Sun Feb 13 10:36:53 2011 +0000
@@ -19,18 +19,22 @@
 
 text {*
   The main lemma (@{text "rexp_imp_finite"}) is proved by a structural induction over regular expressions.
-  While base cases (cases for @{const "NULL"}, @{const "EMPTY"}, @{const "CHAR"}) are quite straight forward,
-  the inductive cases are rather involved. What we have when starting to prove these inductive caes is that
-  the partitions induced by the componet language are finite. The basic idea to show the finiteness of the 
-  partition induced by the composite language is to attach a tag @{text "tag(x)"} to every string 
-  @{text "x"}. The tags are made of equivalent classes from the component partitions. Let @{text "tag"}
-  be the tagging function and @{text "Lang"} be the composite language, it can be proved that
-  if strings with the same tag are equivalent with respect to @{text "Lang"}, expressed as:
+  where base cases (cases for @{const "NULL"}, @{const "EMPTY"}, @{const "CHAR"}) are quite straightforward to
+  proof. Real difficulty lies in inductive cases.  By inductive hypothesis, languages defined by
+  sub-expressions induce finite partitiions. Under such hypothsis, we need to prove that the language 
+  defined by the composite regular expression gives rise to finite partion.  
+  The basic idea is to attach a tag @{text "tag(x)"} to every string 
+  @{text "x"}. The tagging fuction @{text "tag"} is carefully devised, which returns tags 
+  made of equivalent classes of the partitions induced by subexpressoins, and therefore has a finite
+  range. Let @{text "Lang"} be the composite language, it is proved that:
+  \begin{quote}
+  If strings with the same tag are equivalent with respect to @{text "Lang"}, expressed as:
   \[
   @{text "tag(x) = tag(y) \<Longrightarrow> x \<approx>Lang y"}
   \]
-  then the partition induced by @{text "Lang"} must be finite. There are two arguments for this. 
-  The first goes as the following:
+  then the partition induced by @{text "Lang"} must be finite.
+  \end{quote}
+  There are two arguments for this. The first goes as the following:
   \begin{enumerate}
     \item First, the tagging function @{text "tag"} induces an equivalent relation @{text "(=tag=)"} 
           (defiintion of @{text "f_eq_rel"} and lemma @{text "equiv_f_eq_rel"}).

Binary file tphols-2011/myhill.pdf has changed

author	zhang
	Sun, 13 Feb 2011 10:36:53 +0000
changeset 99	54aa3b6dd71c
parent 98	36f9d19be0e6
child 100	2409827d8eb8

Literature/rutten.pdf		file \| annotate \| diff \| comparison \| revisions
Myhill.thy		file \| annotate \| diff \| comparison \| revisions
Myhill_1.thy		file \| annotate \| diff \| comparison \| revisions
Myhill_2.thy		file \| annotate \| diff \| comparison \| revisions
tphols-2011/myhill.pdf		file \| annotate \| diff \| comparison \| revisions