regexp: changeset 90:97b783438316

--- a/Paper/Paper.thy	Wed Feb 09 07:27:30 2011 +0000
+++ b/Paper/Paper.thy	Wed Feb 09 09:46:59 2011 +0000
@@ -200,7 +200,7 @@
   being represented by the empty list, written @{term "[]"}. \emph{Languages}
   are sets of strings. The language containing all strings is written in
   Isabelle/HOL as @{term "UNIV::string set"}. The concatenation of two languages 
-  is written @{term "A ;; B"} and a language raised to the power $n$ is written 
+  is written @{term "A ;; B"} and a language raised to the power @{text n} is written 
   @{term "A \<up> n"}. Their definitions are
 
   \begin{center}
@@ -228,15 +228,15 @@
   our formalisation.\footnote{Available at ???}
 
 
-  The notation for the quotient of a language @{text A} according to an 
-  equivalence relation @{term REL} is in Isabelle/HOL @{term "A // REL"}. We will write 
+  The notation in Isabelle/HOL for the quotient of a language @{text A} according to an 
+  equivalence relation @{term REL} is @{term "A // REL"}. We will write 
   @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined 
   as @{text "{y | y \<approx> x}"}.
 
 
   Central to our proof will be the solution of equational systems
   involving sets of languages. For this we will use Arden's lemma \cite{Brzozowski64}
-  which solves equations of the form @{term "X = A ;; X \<union> B"} provided
+  which solves equations of the form @{term "X = A ;; X \<union> B"} in case
   @{term "[] \<notin> A"}. However we will need the following ``reverse'' 
   version of Arden's lemma.
 
@@ -311,11 +311,10 @@
   \end{tabular}
   \end{center}
 
-  \noindent
-  Given a set or regular expressions @{text rs}, we will need the operation of generating 
-  a regular expressions that matches all languages of @{text rs}. We only need the existence
-  of such an regular expressions therefore we use Isabelle's @{const "fold_graph"} and Hilbert's
-  @{text "\<epsilon>"} to define @{term "\<Uplus>rs"} which, roughly speaking, folds @{const ALT} over the 
+  Given a set of regular expressions @{text rs}, we will need the operation of generating 
+  a corresponding regular expressions that matches all languages of @{text rs}. We only need the existence
+  of such a regular expressions and therefore we use Isabelle's @{const "fold_graph"} and Hilbert's
+  @{text "\<epsilon>"} to define @{term "\<Uplus>rs"}. This function, roughly speaking, folds @{const ALT} over the 
   set @{text rs} with @{const NULL} for the empty set. We can prove that for finite sets @{text rs}
 
   \begin{center}
@@ -323,8 +322,8 @@
   \end{center}
 
   \noindent
-  holds. (whereby @{text "\<calL> ` rs"} stands for the 
-  image of the set @{text rs} under function @{text "\<calL>"}).
+  holds, whereby @{text "\<calL> ` rs"} stands for the 
+  image of the set @{text rs} under function @{text "\<calL>"}.
   
 
 *}
@@ -344,7 +343,18 @@
   \noindent
   It is easy to see that @{term "\<approx>A"} is an equivalence relation, which
   partitions the set of all strings, @{text "UNIV"}, into a set of disjoint
-  equivalence classes. One direction of the Myhill-Nerode theorem establishes 
+  equivalence classes. An example is the regular language containing just
+  the string @{text "[c]"}, then @{term "\<approx>({[c]})"} partitions @{text UNIV}
+  into the three equivalence classes @{text "X\<^isub>1"}, @{text "X\<^isub>2"} and  @{text "X\<^isub>3"}
+  as follows
+  
+  \begin{center}
+  @{text "X\<^isub>1 = {[]}"}\hspace{5mm}
+  @{text "X\<^isub>2 = {[c]}"}\hspace{5mm}
+  @{text "X\<^isub>3 = UNIV - {[], [c]}"}
+  \end{center}
+
+  One direction of the Myhill-Nerode theorem establishes 
   that if there are finitely many equivalence classes, then the language is 
   regular. In our setting we therefore have to show:
   
@@ -353,7 +363,7 @@
   \end{theorem}
 
   \noindent
-  To prove this theorem, we define the set @{term "finals A"} as those equivalence
+  To prove this theorem, we first define the set @{term "finals A"} as those equivalence
   classes that contain strings of @{text A}, namely
   %
   \begin{equation} 
@@ -361,18 +371,19 @@
   \end{equation}
 
   \noindent
-  It is straightforward to show that @{thm lang_is_union_of_finals} and 
+  In our running example, @{text "X\<^isub>1"} is the only equivalence class in @{term "finals {[c]}"}.
+  It is straightforward to show that in general @{thm lang_is_union_of_finals} and 
   @{thm finals_in_partitions} hold. 
   Therefore if we know that there exists a regular expression for every
   equivalence class in @{term "finals A"} (which by assumption must be
-  a finite set), then we can combine these regular expressions with @{const ALT}
-  and obtain a regular expression that matches every string in @{text A}.
+  a finite set), then we can obtain a regular expression using @{text "\<bigplus>"} 
+  that matches every string in @{text A}.
 
 
-  We prove Thm.~\ref{myhillnerodeone} by giving a method that can calculate a
+  Our proof of Thm.~\ref{myhillnerodeone} relies on a method that can calculate a
   regular expression for \emph{every} equivalence class, not just the ones 
   in @{term "finals A"}. We
-  first define a notion of \emph{transition} between equivalence classes
+  first define the notion of \emph{transition} between equivalence classes
   %
   \begin{equation} 
   @{thm transition_def}
@@ -382,7 +393,9 @@
   which means that if we concatenate all strings matching the regular expression @{text r} 
   to the end of all strings in the equivalence class @{text Y}, we obtain a subset of 
   @{text X}. Note that we do not define an automaton here, we merely relate two sets
-  (w.r.t.~a regular expression). 
+  (with the help of a regular expression). In our concrete example we have 
+  @{term "X\<^isub>1 \<Turnstile>(CHAR c)\<Rightarrow> X\<^isub>2"} and @{term "X\<^isub>1 \<Turnstile>r\<Rightarrow> X\<^isub>3"} with @{text r} being any 
+  other regular expression than @{const EMPTY} and @{term "CHAR c"}.
   
   Next we build an equational system that
   contains an equation for each equivalence class. Suppose we have 
@@ -404,8 +417,13 @@
   @{term "Y\<^isub>i\<^isub>j \<Turnstile>(CHAR c\<^isub>i\<^isub>j)\<Rightarrow> X\<^isub>i"}.  The term @{text "\<lambda>(EMPTY)"} acts as a marker for the equivalence
   class containing @{text "[]"}. (Note that we mark, roughly speaking, the
   single ``initial'' state in the equational system, which is different from
-  the method by Brzozowski \cite{Brzozowski64}, since for his purposes he needs to mark 
-  the ``terminal'' states.) Overloading the function @{text L} for the two kinds of terms in the 
+  the method by Brzozowski \cite{Brzozowski64}, since he marks the ``terminal'' 
+  states. We are forced to set up the equational system in this way, because 
+  the Myhill-Nerode relation determines the ``direction'' of the transitions. 
+  The successor ``state'' of an equivalence class @{text Y} can be reached by adding 
+  characters to the end of @{text Y}. This is also the reason why we have to use 
+  our reverse version of Arden's lemma.) 
+  Overloading the function @{text L} for the two kinds of terms in the 
   equational system as follows
   
   \begin{center}

--- a/Paper/document/root.tex	Wed Feb 09 07:27:30 2011 +0000
+++ b/Paper/document/root.tex	Wed Feb 09 09:46:59 2011 +0000
@@ -9,9 +9,8 @@
 \usepackage{ot1patch}
 \usepackage{times}
 \usepackage{proof}
+%%\usepackage{mathabx}
 \usepackage{stmaryrd}
-\usepackage{mathabx}
-
 
 \urlstyle{rm}
 \isabellestyle{it}
@@ -25,6 +24,9 @@
 \renewcommand{\isacharunderscore}{\mbox{$\_\!\_$}}
 
 \newcommand{\isasymcalL}{\ensuremath{\cal{L}}}
+\newcommand{\isasymbigplus}{\ensuremath{\bigplus}}
+
+\newcommand{\bigplus}{\mbox{\large\bf$+$}}
 \begin{document}
 
 \title{A Formalisation of the Myhill-Nerode Theorem\\ based on Regular

author	urbanc
	Wed, 09 Feb 2011 09:46:59 +0000
changeset 90	97b783438316
parent 89	42af13d194c9
child 91	37ab56205097

Paper/Paper.thy		file \| annotate \| diff \| comparison \| revisions
Paper/document/root.tex		file \| annotate \| diff \| comparison \| revisions