regexp: changeset 178:c6ebfe052109

--- a/Journal/Paper.thy	Thu Jul 28 01:12:02 2011 +0000
+++ b/Journal/Paper.thy	Thu Jul 28 11:56:25 2011 +0000
@@ -226,7 +226,7 @@
   \end{center}
 
   \noindent
-  On `paper' or a theorem prover based on set-theory, we can define the corresponding 
+  On `paper' we can define the corresponding 
   graph in terms of the disjoint 
   union of the state nodes. Unfortunately in HOL, the standard definition for disjoint 
   union, namely 
@@ -241,7 +241,7 @@
   single type for automata. As a result we will not be able to define a regular
   language as one for which there exists an automaton that recognises all its
   strings. This is because we cannot make a definition in HOL that is polymorphic in 
-  the state type and also there is no type quantification available in HOL (unlike 
+  the state type and there is no type quantification available in HOL (unlike 
   in Coq, for example).
 
   An alternative, which provides us with a single type for automata, is to give every 
@@ -316,7 +316,7 @@
   Isabelle/HOL nor will we attempt to formalise automata proofs from the
   literature, but take a different approach to regular languages than is
   usually taken. Instead of defining a regular language as one where there
-  exists an automaton that recognises all strings of the language, we define a
+  exists an automaton that recognises all its strings, we define a
   regular language as:
 
   \begin{dfntn}
@@ -329,7 +329,7 @@
   functions, can be easily defined as an inductive datatype. A reasoning
   infrastructure (like induction and recursion) comes then for free in
   HOL. Moreover, no side-conditions will be needed for regular expressions,
-  like we usually need for automata. This convenience of regular expressions has
+  like we need for automata. This convenience of regular expressions has
   recently been exploited in HOL4 with a formalisation of regular expression
   matching based on derivatives \cite{OwensSlind08} and with an equivalence
   checker for regular expressions in Isabelle/HOL \cite{KraussNipkow11}.  The
@@ -350,10 +350,10 @@
   certain tagging-functions, and another indirect proof using Antimirov's
   partial derivatives \cite{Antimirov95}. Again to our best knowledge, the
   tagging-functions have not been used before to establish the Myhill-Nerode
-  theorem. Derivatives of regular expressions have been used widely in the
-  literature about regular expressions. However, partial derivatives are more
-  suitable in the context of the Myhill-Nerode theorem, since it is easier to
-  establish formally their finiteness result.
+  theorem. Derivatives of regular expressions have been recently used quite
+  widely in the literature about regular expressions. However, partial
+  derivatives are more suitable in the context of the Myhill-Nerode theorem,
+  since it is easier to establish formally their finiteness result.
 
 *}
 
@@ -584,8 +584,9 @@
   strings in the equivalence class @{text Y}, we obtain a subset of 
   @{text X}. Note that we do not define an automaton here, we merely relate two sets
   (with the help of a character). In our concrete example we have 
-  @{term "X\<^isub>1 \<Turnstile>c\<Rightarrow> X\<^isub>2"}, @{term "X\<^isub>1 \<Turnstile>d\<Rightarrow> X\<^isub>3"} with @{text d} being any 
-  other character than @{text c}, and @{term "X\<^isub>3 \<Turnstile>d\<Rightarrow> X\<^isub>3"} for any @{text d}.
+  @{term "X\<^isub>1 \<Turnstile>c\<Rightarrow> X\<^isub>2"}, @{term "X\<^isub>1 \<Turnstile>d\<^isub>i\<Rightarrow> X\<^isub>3"} with @{text "d\<^isub>i"} being any 
+  other character than @{text c}, and @{term "X\<^isub>3 \<Turnstile>c\<^isub>j\<Rightarrow> X\<^isub>3"} for any 
+  caracter @{text "c\<^isub>j"}.
   
   Next we construct an \emph{initial equational system} that
   contains an equation for each equivalence class. We first give
@@ -640,10 +641,7 @@
   \noindent
   where @{text "d\<^isub>1\<dots>d\<^isub>n"} is the sequence of all characters
   except @{text c}, and @{text "c\<^isub>1\<dots>c\<^isub>m"} is the sequence of all
-  characters.  In our initial equation systems there can only be finitely many
-  terms of the form @{text "(Y\<^isub>i\<^isub>j, ATOM c\<^isub>i\<^isub>j)"},
-  since by assumption there are only finitely many equivalence classes and
-  only finitely many characters.
+  characters.  
 
   Overloading the function @{text \<calL>} for the two kinds of terms in the
   equational system, we have
@@ -745,10 +743,31 @@
   In this definition, we first delete all terms of the form @{text "(X, r)"} from @{text rhs};
   then we calculate the combined regular expressions for all @{text r} coming 
   from the deleted @{text "(X, r)"}, and take the @{const Star} of it;
-  finally we append this regular expression to @{text rhs'}. It can be easily seen 
-  that this operation mimics Arden's Lemma on the level of equations. To ensure
-  the non-emptiness condition of Arden's Lemma we say that a right-hand side is
-  @{text ardenable} provided
+  finally we append this regular expression to @{text rhs'}. If we apply this
+  operation to the right-hand side of @{text "X\<^isub>3"} in \eqref{exmpcs}, we obtain
+  the equation:
+
+  \begin{center}
+  \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
+  @{term "X\<^isub>3"} & @{text "="} & 
+    @{text "(X\<^isub>1, TIMES (ATOM d\<^isub>1) (STAR \<^raw:\ensuremath{\bigplus}>{ATOM c\<^isub>1,\<dots>,ATOM c\<^isub>m})) + \<dots> "}\\
+  & & \mbox{}\hspace{13mm}
+     @{text "\<dots> + (X\<^isub>1, TIMES (ATOM d\<^isub>n) (STAR \<^raw:\ensuremath{\bigplus}>{ATOM c\<^isub>1,\<dots>,ATOM c\<^isub>m}))"}
+  \end{tabular}
+  \end{center}
+
+
+  \noindent
+  That means we eliminated the dependency of @{text "X\<^isub>3"} on the
+  right-hand side.  Note we used the abbreviation 
+  @{text "\<^raw:\ensuremath{\bigplus}>{ATOM c\<^isub>1,\<dots>,ATOM c\<^isub>m}"} 
+  to stand for a regular expression that matches with every character. In 
+  our algorithm we are only interested in the existence of such a regular expresion
+  and not specify it any further. 
+
+  It can be easily seen that the @{text "Arden"}-operation mimics Arden's
+  Lemma on the level of equations. To ensure the non-emptiness condition of
+  Arden's Lemma we say that a right-hand side is @{text ardenable} provided
 
   \begin{center}
   @{thm ardenable_def}
@@ -785,7 +804,15 @@
   the regular expression corresponding to the deleted terms; finally we append this
   regular expression to @{text "xrhs"} and union it up with @{text rhs'}. When we use
   the substitution operation we will arrange it so that @{text "xrhs"} does not contain
-  any occurrence of @{text X}.
+  any occurrence of @{text X}. For example substituting the first equation in
+  \eqref{exmpcs} into the right-hand side of the second, thus eliminating the equivalence 
+  class @{text "X\<^isub>1"}, gives us the equation
+
+  \begin{equation}\label{exmpresult}
+  \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
+  @{term "X\<^isub>2"} & @{text "="} & @{text "\<lambda>(TIMES ONE (ATOM c))"}\\
+  \end{tabular}}
+  \end{equation}
 
   With these two operations in place, we can define the operation that removes one equation
   from an equational systems @{text ES}. The operation @{const Subst_all}
author	urbanc
	Thu, 28 Jul 2011 11:56:25 +0000
changeset 178	c6ebfe052109
parent 177	50cc1a39c990
child 179	edacc141060f