--- a/Paper/Paper.thy Wed Feb 09 07:27:30 2011 +0000
+++ b/Paper/Paper.thy Wed Feb 09 09:46:59 2011 +0000
@@ -200,7 +200,7 @@
being represented by the empty list, written @{term "[]"}. \emph{Languages}
are sets of strings. The language containing all strings is written in
Isabelle/HOL as @{term "UNIV::string set"}. The concatenation of two languages
- is written @{term "A ;; B"} and a language raised to the power $n$ is written
+ is written @{term "A ;; B"} and a language raised to the power @{text n} is written
@{term "A \<up> n"}. Their definitions are
\begin{center}
@@ -228,15 +228,15 @@
our formalisation.\footnote{Available at ???}
- The notation for the quotient of a language @{text A} according to an
- equivalence relation @{term REL} is in Isabelle/HOL @{term "A // REL"}. We will write
+ The notation in Isabelle/HOL for the quotient of a language @{text A} according to an
+ equivalence relation @{term REL} is @{term "A // REL"}. We will write
@{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined
as @{text "{y | y \<approx> x}"}.
Central to our proof will be the solution of equational systems
involving sets of languages. For this we will use Arden's lemma \cite{Brzozowski64}
- which solves equations of the form @{term "X = A ;; X \<union> B"} provided
+ which solves equations of the form @{term "X = A ;; X \<union> B"} in case
@{term "[] \<notin> A"}. However we will need the following ``reverse''
version of Arden's lemma.
@@ -311,11 +311,10 @@
\end{tabular}
\end{center}
- \noindent
- Given a set or regular expressions @{text rs}, we will need the operation of generating
- a regular expressions that matches all languages of @{text rs}. We only need the existence
- of such an regular expressions therefore we use Isabelle's @{const "fold_graph"} and Hilbert's
- @{text "\<epsilon>"} to define @{term "\<Uplus>rs"} which, roughly speaking, folds @{const ALT} over the
+ Given a set of regular expressions @{text rs}, we will need the operation of generating
+ a corresponding regular expressions that matches all languages of @{text rs}. We only need the existence
+ of such a regular expressions and therefore we use Isabelle's @{const "fold_graph"} and Hilbert's
+ @{text "\<epsilon>"} to define @{term "\<Uplus>rs"}. This function, roughly speaking, folds @{const ALT} over the
set @{text rs} with @{const NULL} for the empty set. We can prove that for finite sets @{text rs}
\begin{center}
@@ -323,8 +322,8 @@
\end{center}
\noindent
- holds. (whereby @{text "\<calL> ` rs"} stands for the
- image of the set @{text rs} under function @{text "\<calL>"}).
+ holds, whereby @{text "\<calL> ` rs"} stands for the
+ image of the set @{text rs} under function @{text "\<calL>"}.
*}
@@ -344,7 +343,18 @@
\noindent
It is easy to see that @{term "\<approx>A"} is an equivalence relation, which
partitions the set of all strings, @{text "UNIV"}, into a set of disjoint
- equivalence classes. One direction of the Myhill-Nerode theorem establishes
+ equivalence classes. An example is the regular language containing just
+ the string @{text "[c]"}, then @{term "\<approx>({[c]})"} partitions @{text UNIV}
+ into the three equivalence classes @{text "X\<^isub>1"}, @{text "X\<^isub>2"} and @{text "X\<^isub>3"}
+ as follows
+
+ \begin{center}
+ @{text "X\<^isub>1 = {[]}"}\hspace{5mm}
+ @{text "X\<^isub>2 = {[c]}"}\hspace{5mm}
+ @{text "X\<^isub>3 = UNIV - {[], [c]}"}
+ \end{center}
+
+ One direction of the Myhill-Nerode theorem establishes
that if there are finitely many equivalence classes, then the language is
regular. In our setting we therefore have to show:
@@ -353,7 +363,7 @@
\end{theorem}
\noindent
- To prove this theorem, we define the set @{term "finals A"} as those equivalence
+ To prove this theorem, we first define the set @{term "finals A"} as those equivalence
classes that contain strings of @{text A}, namely
%
\begin{equation}
@@ -361,18 +371,19 @@
\end{equation}
\noindent
- It is straightforward to show that @{thm lang_is_union_of_finals} and
+ In our running example, @{text "X\<^isub>1"} is the only equivalence class in @{term "finals {[c]}"}.
+ It is straightforward to show that in general @{thm lang_is_union_of_finals} and
@{thm finals_in_partitions} hold.
Therefore if we know that there exists a regular expression for every
equivalence class in @{term "finals A"} (which by assumption must be
- a finite set), then we can combine these regular expressions with @{const ALT}
- and obtain a regular expression that matches every string in @{text A}.
+ a finite set), then we can obtain a regular expression using @{text "\<bigplus>"}
+ that matches every string in @{text A}.
- We prove Thm.~\ref{myhillnerodeone} by giving a method that can calculate a
+ Our proof of Thm.~\ref{myhillnerodeone} relies on a method that can calculate a
regular expression for \emph{every} equivalence class, not just the ones
in @{term "finals A"}. We
- first define a notion of \emph{transition} between equivalence classes
+ first define the notion of \emph{transition} between equivalence classes
%
\begin{equation}
@{thm transition_def}
@@ -382,7 +393,9 @@
which means that if we concatenate all strings matching the regular expression @{text r}
to the end of all strings in the equivalence class @{text Y}, we obtain a subset of
@{text X}. Note that we do not define an automaton here, we merely relate two sets
- (w.r.t.~a regular expression).
+ (with the help of a regular expression). In our concrete example we have
+ @{term "X\<^isub>1 \<Turnstile>(CHAR c)\<Rightarrow> X\<^isub>2"} and @{term "X\<^isub>1 \<Turnstile>r\<Rightarrow> X\<^isub>3"} with @{text r} being any
+ other regular expression than @{const EMPTY} and @{term "CHAR c"}.
Next we build an equational system that
contains an equation for each equivalence class. Suppose we have
@@ -404,8 +417,13 @@
@{term "Y\<^isub>i\<^isub>j \<Turnstile>(CHAR c\<^isub>i\<^isub>j)\<Rightarrow> X\<^isub>i"}. The term @{text "\<lambda>(EMPTY)"} acts as a marker for the equivalence
class containing @{text "[]"}. (Note that we mark, roughly speaking, the
single ``initial'' state in the equational system, which is different from
- the method by Brzozowski \cite{Brzozowski64}, since for his purposes he needs to mark
- the ``terminal'' states.) Overloading the function @{text L} for the two kinds of terms in the
+ the method by Brzozowski \cite{Brzozowski64}, since he marks the ``terminal''
+ states. We are forced to set up the equational system in this way, because
+ the Myhill-Nerode relation determines the ``direction'' of the transitions.
+ The successor ``state'' of an equivalence class @{text Y} can be reached by adding
+ characters to the end of @{text Y}. This is also the reason why we have to use
+ our reverse version of Arden's lemma.)
+ Overloading the function @{text L} for the two kinds of terms in the
equational system as follows
\begin{center}
--- a/Paper/document/root.tex Wed Feb 09 07:27:30 2011 +0000
+++ b/Paper/document/root.tex Wed Feb 09 09:46:59 2011 +0000
@@ -9,9 +9,8 @@
\usepackage{ot1patch}
\usepackage{times}
\usepackage{proof}
+%%\usepackage{mathabx}
\usepackage{stmaryrd}
-\usepackage{mathabx}
-
\urlstyle{rm}
\isabellestyle{it}
@@ -25,6 +24,9 @@
\renewcommand{\isacharunderscore}{\mbox{$\_\!\_$}}
\newcommand{\isasymcalL}{\ensuremath{\cal{L}}}
+\newcommand{\isasymbigplus}{\ensuremath{\bigplus}}
+
+\newcommand{\bigplus}{\mbox{\large\bf$+$}}
\begin{document}
\title{A Formalisation of the Myhill-Nerode Theorem\\ based on Regular