regexp: comparison Paper/Paper.thy

equal deleted inserted replaced

-:42af13d194c9
+:97b783438316
 text {*
 Strings in Isabelle/HOL are lists of characters with the \emph{empty string}
 being represented by the empty list, written @{term "[]"}. \emph{Languages}
 are sets of strings. The language containing all strings is written in
 Isabelle/HOL as @{term "UNIV::string set"}. The concatenation of two languages
-is written @{term "A ;; B"} and a language raised to the power $n$ is written
+is written @{term "A ;; B"} and a language raised to the power @{text n} is written
 @{term "A \<up> n"}. Their definitions are
 \begin{center}
 @{thm Seq_def[THEN eq_reflection, where A1="A" and B1="B"]}
 \hspace{7mm}
 \noindent
 We omit the proofs, but invite the reader to consult
 our formalisation.\footnote{Available at ???}
-The notation for the quotient of a language @{text A} according to an
+The notation in Isabelle/HOL for the quotient of a language @{text A} according to an
-equivalence relation @{term REL} is in Isabelle/HOL @{term "A // REL"}. We will write
+equivalence relation @{term REL} is @{term "A // REL"}. We will write
 @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined
 as @{text "{y | y \<approx> x}"}.
 Central to our proof will be the solution of equational systems
 involving sets of languages. For this we will use Arden's lemma \cite{Brzozowski64}
-which solves equations of the form @{term "X = A ;; X \<union> B"} provided
+which solves equations of the form @{term "X = A ;; X \<union> B"} in case
 @{term "[] \<notin> A"}. However we will need the following ``reverse''
 version of Arden's lemma.
 \begin{lemma}[Reverse Arden's Lemma]\label{arden}\mbox{}\\
 If @{thm (prem 1) arden} then
 @{thm (rhs) L_rexp.simps(6)[where r="r"]}\\
 \end{tabular}
 \end{tabular}
 \end{center}
-\noindent
+Given a set of regular expressions @{text rs}, we will need the operation of generating
-Given a set or regular expressions @{text rs}, we will need the operation of generating
+a corresponding regular expressions that matches all languages of @{text rs}. We only need the existence
-a regular expressions that matches all languages of @{text rs}. We only need the existence
+of such a regular expressions and therefore we use Isabelle's @{const "fold_graph"} and Hilbert's
-of such an regular expressions therefore we use Isabelle's @{const "fold_graph"} and Hilbert's
+@{text "\<epsilon>"} to define @{term "\<Uplus>rs"}. This function, roughly speaking, folds @{const ALT} over the
-@{text "\<epsilon>"} to define @{term "\<Uplus>rs"} which, roughly speaking, folds @{const ALT} over the
 set @{text rs} with @{const NULL} for the empty set. We can prove that for finite sets @{text rs}
 \begin{center}
 @{thm (lhs) folds_alt_simp}@{text "= \<Union> (\<calL> ` rs)"}
 \end{center}
 \noindent
-holds. (whereby @{text "\<calL> ` rs"} stands for the
+holds, whereby @{text "\<calL> ` rs"} stands for the
-image of the set @{text rs} under function @{text "\<calL>"}).
+image of the set @{text rs} under function @{text "\<calL>"}.
 *}
 section {* Finite Partitions Imply Regularity of a Language *}
 \end{definition}
 \noindent
 It is easy to see that @{term "\<approx>A"} is an equivalence relation, which
 partitions the set of all strings, @{text "UNIV"}, into a set of disjoint
-equivalence classes. One direction of the Myhill-Nerode theorem establishes
+equivalence classes. An example is the regular language containing just
+the string @{text "[c]"}, then @{term "\<approx>({[c]})"} partitions @{text UNIV}
+into the three equivalence classes @{text "X\<^isub>1"}, @{text "X\<^isub>2"} and  @{text "X\<^isub>3"}
+as follows
+\begin{center}
+@{text "X\<^isub>1 = {[]}"}\hspace{5mm}
+@{text "X\<^isub>2 = {[c]}"}\hspace{5mm}
+@{text "X\<^isub>3 = UNIV - {[], [c]}"}
+\end{center}
+One direction of the Myhill-Nerode theorem establishes
 that if there are finitely many equivalence classes, then the language is
 regular. In our setting we therefore have to show:
 \begin{theorem}\label{myhillnerodeone}
 @{thm[mode=IfThen] hard_direction}
 \end{theorem}
 \noindent
-To prove this theorem, we define the set @{term "finals A"} as those equivalence
+To prove this theorem, we first define the set @{term "finals A"} as those equivalence
 classes that contain strings of @{text A}, namely
 %
 \begin{equation}
 @{thm finals_def}
 \end{equation}
 \noindent
-It is straightforward to show that @{thm lang_is_union_of_finals} and
+In our running example, @{text "X\<^isub>1"} is the only equivalence class in @{term "finals {[c]}"}.
+It is straightforward to show that in general @{thm lang_is_union_of_finals} and
 @{thm finals_in_partitions} hold.
 Therefore if we know that there exists a regular expression for every
 equivalence class in @{term "finals A"} (which by assumption must be
-a finite set), then we can combine these regular expressions with @{const ALT}
+a finite set), then we can obtain a regular expression using @{text "\<bigplus>"}
-and obtain a regular expression that matches every string in @{text A}.
+that matches every string in @{text A}.
-We prove Thm.~\ref{myhillnerodeone} by giving a method that can calculate a
+Our proof of Thm.~\ref{myhillnerodeone} relies on a method that can calculate a
 regular expression for \emph{every} equivalence class, not just the ones
 in @{term "finals A"}. We
-first define a notion of \emph{transition} between equivalence classes
+first define the notion of \emph{transition} between equivalence classes
 %
 \begin{equation}
 @{thm transition_def}
 \end{equation}
 \noindent
 which means that if we concatenate all strings matching the regular expression @{text r}
 to the end of all strings in the equivalence class @{text Y}, we obtain a subset of
 @{text X}. Note that we do not define an automaton here, we merely relate two sets
-(w.r.t.~a regular expression).
+(with the help of a regular expression). In our concrete example we have
+@{term "X\<^isub>1 \<Turnstile>(CHAR c)\<Rightarrow> X\<^isub>2"} and @{term "X\<^isub>1 \<Turnstile>r\<Rightarrow> X\<^isub>3"} with @{text r} being any
+other regular expression than @{const EMPTY} and @{term "CHAR c"}.
 Next we build an equational system that
 contains an equation for each equivalence class. Suppose we have
 the equivalence classes @{text "X\<^isub>1,\<dots>,X\<^isub>n"}, there must be one and only one that
 contains the empty string @{text "[]"} (since equivalence classes are disjoint).
 \noindent
 where the pairs @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"} stand for all transitions
 @{term "Y\<^isub>i\<^isub>j \<Turnstile>(CHAR c\<^isub>i\<^isub>j)\<Rightarrow> X\<^isub>i"}.  The term @{text "\<lambda>(EMPTY)"} acts as a marker for the equivalence
 class containing @{text "[]"}. (Note that we mark, roughly speaking, the
 single ``initial'' state in the equational system, which is different from
-the method by Brzozowski \cite{Brzozowski64}, since for his purposes he needs to mark
+the method by Brzozowski \cite{Brzozowski64}, since he marks the ``terminal''
-the ``terminal'' states.) Overloading the function @{text L} for the two kinds of terms in the
+states. We are forced to set up the equational system in this way, because
+the Myhill-Nerode relation determines the ``direction'' of the transitions.
+The successor ``state'' of an equivalence class @{text Y} can be reached by adding
+characters to the end of @{text Y}. This is also the reason why we have to use
+our reverse version of Arden's lemma.)
+Overloading the function @{text L} for the two kinds of terms in the
 equational system as follows
 \begin{center}
 @{thm L_rhs_item.simps(2)[where X="Y" and r="r", THEN eq_reflection]}\hspace{10mm}
 @{thm L_rhs_item.simps(1)[where r="r", THEN eq_reflection]}

changeset 90	97b783438316
parent 89	42af13d194c9
child 92	a9ebc410a5c8