# HG changeset patch # User urbanc # Date 1297244819 0 # Node ID 97b783438316a15fb735a9f3827d3435bfcc3da0 # Parent 42af13d194c918a96de36a9163ae310e1a577113 added an example diff -r 42af13d194c9 -r 97b783438316 Paper/Paper.thy --- a/Paper/Paper.thy Wed Feb 09 07:27:30 2011 +0000 +++ b/Paper/Paper.thy Wed Feb 09 09:46:59 2011 +0000 @@ -200,7 +200,7 @@ being represented by the empty list, written @{term "[]"}. \emph{Languages} are sets of strings. The language containing all strings is written in Isabelle/HOL as @{term "UNIV::string set"}. The concatenation of two languages - is written @{term "A ;; B"} and a language raised to the power $n$ is written + is written @{term "A ;; B"} and a language raised to the power @{text n} is written @{term "A \ n"}. Their definitions are \begin{center} @@ -228,15 +228,15 @@ our formalisation.\footnote{Available at ???} - The notation for the quotient of a language @{text A} according to an - equivalence relation @{term REL} is in Isabelle/HOL @{term "A // REL"}. We will write + The notation in Isabelle/HOL for the quotient of a language @{text A} according to an + equivalence relation @{term REL} is @{term "A // REL"}. We will write @{text "\x\\<^isub>\"} for the equivalence class defined as @{text "{y | y \ x}"}. Central to our proof will be the solution of equational systems involving sets of languages. For this we will use Arden's lemma \cite{Brzozowski64} - which solves equations of the form @{term "X = A ;; X \ B"} provided + which solves equations of the form @{term "X = A ;; X \ B"} in case @{term "[] \ A"}. However we will need the following ``reverse'' version of Arden's lemma. @@ -311,11 +311,10 @@ \end{tabular} \end{center} - \noindent - Given a set or regular expressions @{text rs}, we will need the operation of generating - a regular expressions that matches all languages of @{text rs}. We only need the existence - of such an regular expressions therefore we use Isabelle's @{const "fold_graph"} and Hilbert's - @{text "\"} to define @{term "\rs"} which, roughly speaking, folds @{const ALT} over the + Given a set of regular expressions @{text rs}, we will need the operation of generating + a corresponding regular expressions that matches all languages of @{text rs}. We only need the existence + of such a regular expressions and therefore we use Isabelle's @{const "fold_graph"} and Hilbert's + @{text "\"} to define @{term "\rs"}. This function, roughly speaking, folds @{const ALT} over the set @{text rs} with @{const NULL} for the empty set. We can prove that for finite sets @{text rs} \begin{center} @@ -323,8 +322,8 @@ \end{center} \noindent - holds. (whereby @{text "\ ` rs"} stands for the - image of the set @{text rs} under function @{text "\"}). + holds, whereby @{text "\ ` rs"} stands for the + image of the set @{text rs} under function @{text "\"}. *} @@ -344,7 +343,18 @@ \noindent It is easy to see that @{term "\A"} is an equivalence relation, which partitions the set of all strings, @{text "UNIV"}, into a set of disjoint - equivalence classes. One direction of the Myhill-Nerode theorem establishes + equivalence classes. An example is the regular language containing just + the string @{text "[c]"}, then @{term "\({[c]})"} partitions @{text UNIV} + into the three equivalence classes @{text "X\<^isub>1"}, @{text "X\<^isub>2"} and @{text "X\<^isub>3"} + as follows + + \begin{center} + @{text "X\<^isub>1 = {[]}"}\hspace{5mm} + @{text "X\<^isub>2 = {[c]}"}\hspace{5mm} + @{text "X\<^isub>3 = UNIV - {[], [c]}"} + \end{center} + + One direction of the Myhill-Nerode theorem establishes that if there are finitely many equivalence classes, then the language is regular. In our setting we therefore have to show: @@ -353,7 +363,7 @@ \end{theorem} \noindent - To prove this theorem, we define the set @{term "finals A"} as those equivalence + To prove this theorem, we first define the set @{term "finals A"} as those equivalence classes that contain strings of @{text A}, namely % \begin{equation} @@ -361,18 +371,19 @@ \end{equation} \noindent - It is straightforward to show that @{thm lang_is_union_of_finals} and + In our running example, @{text "X\<^isub>1"} is the only equivalence class in @{term "finals {[c]}"}. + It is straightforward to show that in general @{thm lang_is_union_of_finals} and @{thm finals_in_partitions} hold. Therefore if we know that there exists a regular expression for every equivalence class in @{term "finals A"} (which by assumption must be - a finite set), then we can combine these regular expressions with @{const ALT} - and obtain a regular expression that matches every string in @{text A}. + a finite set), then we can obtain a regular expression using @{text "\"} + that matches every string in @{text A}. - We prove Thm.~\ref{myhillnerodeone} by giving a method that can calculate a + Our proof of Thm.~\ref{myhillnerodeone} relies on a method that can calculate a regular expression for \emph{every} equivalence class, not just the ones in @{term "finals A"}. We - first define a notion of \emph{transition} between equivalence classes + first define the notion of \emph{transition} between equivalence classes % \begin{equation} @{thm transition_def} @@ -382,7 +393,9 @@ which means that if we concatenate all strings matching the regular expression @{text r} to the end of all strings in the equivalence class @{text Y}, we obtain a subset of @{text X}. Note that we do not define an automaton here, we merely relate two sets - (w.r.t.~a regular expression). + (with the help of a regular expression). In our concrete example we have + @{term "X\<^isub>1 \(CHAR c)\ X\<^isub>2"} and @{term "X\<^isub>1 \r\ X\<^isub>3"} with @{text r} being any + other regular expression than @{const EMPTY} and @{term "CHAR c"}. Next we build an equational system that contains an equation for each equivalence class. Suppose we have @@ -404,8 +417,13 @@ @{term "Y\<^isub>i\<^isub>j \(CHAR c\<^isub>i\<^isub>j)\ X\<^isub>i"}. The term @{text "\(EMPTY)"} acts as a marker for the equivalence class containing @{text "[]"}. (Note that we mark, roughly speaking, the single ``initial'' state in the equational system, which is different from - the method by Brzozowski \cite{Brzozowski64}, since for his purposes he needs to mark - the ``terminal'' states.) Overloading the function @{text L} for the two kinds of terms in the + the method by Brzozowski \cite{Brzozowski64}, since he marks the ``terminal'' + states. We are forced to set up the equational system in this way, because + the Myhill-Nerode relation determines the ``direction'' of the transitions. + The successor ``state'' of an equivalence class @{text Y} can be reached by adding + characters to the end of @{text Y}. This is also the reason why we have to use + our reverse version of Arden's lemma.) + Overloading the function @{text L} for the two kinds of terms in the equational system as follows \begin{center} diff -r 42af13d194c9 -r 97b783438316 Paper/document/root.tex --- a/Paper/document/root.tex Wed Feb 09 07:27:30 2011 +0000 +++ b/Paper/document/root.tex Wed Feb 09 09:46:59 2011 +0000 @@ -9,9 +9,8 @@ \usepackage{ot1patch} \usepackage{times} \usepackage{proof} +%%\usepackage{mathabx} \usepackage{stmaryrd} -\usepackage{mathabx} - \urlstyle{rm} \isabellestyle{it} @@ -25,6 +24,9 @@ \renewcommand{\isacharunderscore}{\mbox{$\_\!\_$}} \newcommand{\isasymcalL}{\ensuremath{\cal{L}}} +\newcommand{\isasymbigplus}{\ensuremath{\bigplus}} + +\newcommand{\bigplus}{\mbox{\large\bf$+$}} \begin{document} \title{A Formalisation of the Myhill-Nerode Theorem\\ based on Regular