Journal/Paper.thy
changeset 162 e93760534354
parent 160 ea2e5acbfe4a
child 167 61d0a412a3ae
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Journal/Paper.thy	Wed May 18 19:54:43 2011 +0000
@@ -0,0 +1,1403 @@
+(*<*)
+theory Paper
+imports "../Derivs" "~~/src/HOL/Library/LaTeXsugar" 
+begin
+
+declare [[show_question_marks = false]]
+
+consts
+ REL :: "(string \<times> string) \<Rightarrow> bool"
+ UPLUS :: "'a set \<Rightarrow> 'a set \<Rightarrow> (nat \<times> 'a) set"
+
+abbreviation
+  "EClass x R \<equiv> R `` {x}"
+
+abbreviation 
+  "Append_rexp2 r_itm r \<equiv> Append_rexp r r_itm"
+
+
+notation (latex output)
+  str_eq_rel ("\<approx>\<^bsub>_\<^esub>") and
+  str_eq ("_ \<approx>\<^bsub>_\<^esub> _") and
+  Seq (infixr "\<cdot>" 100) and
+  Star ("_\<^bsup>\<star>\<^esup>") and
+  pow ("_\<^bsup>_\<^esup>" [100, 100] 100) and
+  Suc ("_+1" [100] 100) and
+  quotient ("_ \<^raw:\ensuremath{\!\sslash\!}> _" [90, 90] 90) and
+  REL ("\<approx>") and
+  UPLUS ("_ \<^raw:\ensuremath{\uplus}> _" [90, 90] 90) and
+  L ("\<^raw:\ensuremath{\cal{L}}>'(_')" [0] 101) and
+  Lam ("\<lambda>'(_')" [100] 100) and 
+  Trn ("'(_, _')" [100, 100] 100) and 
+  EClass ("\<lbrakk>_\<rbrakk>\<^bsub>_\<^esub>" [100, 100] 100) and
+  transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longmapsto}}> _" [100, 100, 100] 100) and
+  Setalt ("\<^raw:\ensuremath{\bigplus}>_" [1000] 999) and
+  Append_rexp2 ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 100) and
+  Append_rexp_rhs ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 50) and
+  uminus ("\<^raw:\ensuremath{\overline{>_\<^raw:}}>" [100] 100) and
+  tag_str_ALT ("tag\<^isub>A\<^isub>L\<^isub>T _ _" [100, 100] 100) and
+  tag_str_ALT ("tag\<^isub>A\<^isub>L\<^isub>T _ _ _" [100, 100, 100] 100) and
+  tag_str_SEQ ("tag\<^isub>S\<^isub>E\<^isub>Q _ _" [100, 100] 100) and
+  tag_str_SEQ ("tag\<^isub>S\<^isub>E\<^isub>Q _ _ _" [100, 100, 100] 100) and
+  tag_str_STAR ("tag\<^isub>S\<^isub>T\<^isub>A\<^isub>R _" [100] 100) and
+  tag_str_STAR ("tag\<^isub>S\<^isub>T\<^isub>A\<^isub>R _ _" [100, 100] 100)
+lemma meta_eq_app:
+  shows "f \<equiv> \<lambda>x. g x \<Longrightarrow> f x \<equiv> g x"
+  by auto
+
+(*>*)
+
+
+section {* Introduction *}
+
+text {*
+  Regular languages are an important and well-understood subject in Computer
+  Science, with many beautiful theorems and many useful algorithms. There is a
+  wide range of textbooks on this subject, many of which are aimed at students
+  and contain very detailed `pencil-and-paper' proofs
+  (e.g.~\cite{Kozen97}). It seems natural to exercise theorem provers by
+  formalising the theorems and by verifying formally the algorithms.
+
+  There is however a problem: the typical approach to regular languages is to
+  introduce finite automata and then define everything in terms of them.  For
+  example, a regular language is normally defined as one whose strings are
+  recognised by a finite deterministic automaton. This approach has many
+  benefits. Among them is the fact that it is easy to convince oneself that
+  regular languages are closed under complementation: one just has to exchange
+  the accepting and non-accepting states in the corresponding automaton to
+  obtain an automaton for the complement language.  The problem, however, lies with
+  formalising such reasoning in a HOL-based theorem prover, in our case
+  Isabelle/HOL. Automata are built up from states and transitions that 
+  need to be represented as graphs, matrices or functions, none
+  of which can be defined as an inductive datatype. 
+
+  In case of graphs and matrices, this means we have to build our own
+  reasoning infrastructure for them, as neither Isabelle/HOL nor HOL4 nor
+  HOLlight support them with libraries. Even worse, reasoning about graphs and
+  matrices can be a real hassle in HOL-based theorem provers.  Consider for
+  example the operation of sequencing two automata, say $A_1$ and $A_2$, by
+  connecting the accepting states of $A_1$ to the initial state of $A_2$:\\[-5.5mm]  
+  %  
+  \begin{center}
+  \begin{tabular}{ccc}
+  \begin{tikzpicture}[scale=0.8]
+  %\draw[step=2mm] (-1,-1) grid (1,1);
+  
+  \draw[rounded corners=1mm, very thick] (-1.0,-0.3) rectangle (-0.2,0.3);
+  \draw[rounded corners=1mm, very thick] ( 0.2,-0.3) rectangle ( 1.0,0.3);
+
+  \node (A) at (-1.0,0.0) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
+  \node (B) at ( 0.2,0.0) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
+  
+  \node (C) at (-0.2, 0.13) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
+  \node (D) at (-0.2,-0.13) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
+
+  \node (E) at (1.0, 0.2) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
+  \node (F) at (1.0,-0.0) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
+  \node (G) at (1.0,-0.2) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
+
+  \draw (-0.6,0.0) node {\footnotesize$A_1$};
+  \draw ( 0.6,0.0) node {\footnotesize$A_2$};
+  \end{tikzpicture}
+
+  & 
+
+  \raisebox{1.1mm}{\bf\Large$\;\;\;\Rightarrow\,\;\;$}
+
+  &
+
+  \begin{tikzpicture}[scale=0.8]
+  %\draw[step=2mm] (-1,-1) grid (1,1);
+  
+  \draw[rounded corners=1mm, very thick] (-1.0,-0.3) rectangle (-0.2,0.3);
+  \draw[rounded corners=1mm, very thick] ( 0.2,-0.3) rectangle ( 1.0,0.3);
+
+  \node (A) at (-1.0,0.0) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
+  \node (B) at ( 0.2,0.0) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
+  
+  \node (C) at (-0.2, 0.13) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
+  \node (D) at (-0.2,-0.13) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
+
+  \node (E) at (1.0, 0.2) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
+  \node (F) at (1.0,-0.0) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
+  \node (G) at (1.0,-0.2) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
+  
+  \draw (C) to [very thick, bend left=45] (B);
+  \draw (D) to [very thick, bend right=45] (B);
+
+  \draw (-0.6,0.0) node {\footnotesize$A_1$};
+  \draw ( 0.6,0.0) node {\footnotesize$A_2$};
+  \end{tikzpicture}
+
+  \end{tabular}
+  \end{center}
+
+  \noindent
+  On `paper' we can define the corresponding graph in terms of the disjoint 
+  union of the state nodes. Unfortunately in HOL, the standard definition for disjoint 
+  union, namely 
+  %
+  \begin{equation}\label{disjointunion}
+  @{term "UPLUS A\<^isub>1 A\<^isub>2 \<equiv> {(1, x) | x. x \<in> A\<^isub>1} \<union> {(2, y) | y. y \<in> A\<^isub>2}"}
+  \end{equation}
+
+  \noindent
+  changes the type---the disjoint union is not a set, but a set of pairs. 
+  Using this definition for disjoint union means we do not have a single type for automata
+  and hence will not be able to state certain properties about \emph{all}
+  automata, since there is no type quantification available in HOL (unlike in Coq, for example). An
+  alternative, which provides us with a single type for automata, is to give every 
+  state node an identity, for example a natural
+  number, and then be careful to rename these identities apart whenever
+  connecting two automata. This results in clunky proofs
+  establishing that properties are invariant under renaming. Similarly,
+  connecting two automata represented as matrices results in very adhoc
+  constructions, which are not pleasant to reason about.
+
+  Functions are much better supported in Isabelle/HOL, but they still lead to similar
+  problems as with graphs.  Composing, for example, two non-deterministic automata in parallel
+  requires also the formalisation of disjoint unions. Nipkow \cite{Nipkow98} 
+  dismisses for this the option of using identities, because it leads according to 
+  him to ``messy proofs''. He
+  opts for a variant of \eqref{disjointunion} using bit lists, but writes 
+
+  \begin{quote}
+  \it%
+  \begin{tabular}{@ {}l@ {}p{0.88\textwidth}@ {}}
+  `` & All lemmas appear obvious given a picture of the composition of automata\ldots
+       Yet their proofs require a painful amount of detail.''
+  \end{tabular}
+  \end{quote}
+
+  \noindent
+  and
+  
+  \begin{quote}
+  \it%
+  \begin{tabular}{@ {}l@ {}p{0.88\textwidth}@ {}}
+  `` & If the reader finds the above treatment in terms of bit lists revoltingly
+       concrete, I cannot disagree. A more abstract approach is clearly desirable.''
+  \end{tabular}
+  \end{quote}
+
+
+  \noindent
+  Moreover, it is not so clear how to conveniently impose a finiteness condition 
+  upon functions in order to represent \emph{finite} automata. The best is
+  probably to resort to more advanced reasoning frameworks, such as \emph{locales}
+  or \emph{type classes},
+  which are \emph{not} available in all HOL-based theorem provers.
+
+  {\bf add commnets from Brozowski}
+
+  Because of these problems to do with representing automata, there seems
+  to be no substantial formalisation of automata theory and regular languages 
+  carried out in HOL-based theorem provers. Nipkow  \cite{Nipkow98} establishes
+  the link between regular expressions and automata in
+  the context of lexing. Berghofer and Reiter \cite{BerghoferReiter09} 
+  formalise automata working over 
+  bit strings in the context of Presburger arithmetic.
+  The only larger formalisations of automata theory 
+  are carried out in Nuprl \cite{Constable00} and in Coq \cite{Filliatre97}.
+  
+  In this paper, we will not attempt to formalise automata theory in
+  Isabelle/HOL, but take a different approach to regular
+  languages. Instead of defining a regular language as one where there exists
+  an automaton that recognises all strings of the language, we define a
+  regular language as:
+
+  \begin{definition}
+  A language @{text A} is \emph{regular}, provided there is a regular expression that matches all
+  strings of @{text "A"}.
+  \end{definition}
+  
+  \noindent
+  The reason is that regular expressions, unlike graphs, matrices and functions, can
+  be easily defined as inductive datatype. Consequently a corresponding reasoning 
+  infrastructure comes for free. This has recently been exploited in HOL4 with a formalisation
+  of regular expression matching based on derivatives \cite{OwensSlind08} and
+  with an equivalence checker for regular expressions in Isabelle/HOL \cite{KraussNipkow11}.  
+  The purpose of this paper is to
+  show that a central result about regular languages---the Myhill-Nerode theorem---can 
+  be recreated by only using regular expressions. This theorem gives necessary
+  and sufficient conditions for when a language is regular. As a corollary of this
+  theorem we can easily establish the usual closure properties, including 
+  complementation, for regular languages.\smallskip
+  
+  \noindent
+  {\bf Contributions:} 
+  There is an extensive literature on regular languages.
+  To our best knowledge, our proof of the Myhill-Nerode theorem is the
+  first that is based on regular expressions, only. We prove the part of this theorem 
+  stating that a regular expression has only finitely many partitions using certain 
+  tagging-functions. Again to our best knowledge, these tagging-functions have
+  not been used before to establish the Myhill-Nerode theorem.
+*}
+
+section {* Preliminaries *}
+
+text {*
+  Strings in Isabelle/HOL are lists of characters with the \emph{empty string}
+  being represented by the empty list, written @{term "[]"}.  \emph{Languages}
+  are sets of strings. The language containing all strings is written in
+  Isabelle/HOL as @{term "UNIV::string set"}. The concatenation of two languages 
+  is written @{term "A ;; B"} and a language raised to the power @{text n} is written 
+  @{term "A \<up> n"}. They are defined as usual
+
+  \begin{center}
+  @{thm Seq_def[THEN eq_reflection, where A1="A" and B1="B"]}
+  \hspace{7mm}
+  @{thm pow.simps(1)[THEN eq_reflection, where A1="A"]}
+  \hspace{7mm}
+  @{thm pow.simps(2)[THEN eq_reflection, where A1="A" and n1="n"]}
+  \end{center}
+
+  \noindent
+  where @{text "@"} is the list-append operation. The Kleene-star of a language @{text A}
+  is defined as the union over all powers, namely @{thm Star_def}. In the paper
+  we will make use of the following properties of these constructions.
+  
+  \begin{proposition}\label{langprops}\mbox{}\\
+  \begin{tabular}{@ {}ll}
+  (i)   & @{thm star_cases}     \\ 
+  (ii)  & @{thm[mode=IfThen] pow_length}\\
+  (iii) & @{thm seq_Union_left} \\ 
+  \end{tabular}
+  \end{proposition}
+
+  \noindent
+  In @{text "(ii)"} we use the notation @{term "length s"} for the length of a
+  string; this property states that if \mbox{@{term "[] \<notin> A"}} then the lengths of
+  the strings in @{term "A \<up> (Suc n)"} must be longer than @{text n}.  We omit
+  the proofs for these properties, but invite the reader to consult our
+  formalisation.\footnote{Available at \url{http://www4.in.tum.de/~urbanc/regexp.html}}
+
+  The notation in Isabelle/HOL for the quotient of a language @{text A} according to an 
+  equivalence relation @{term REL} is @{term "A // REL"}. We will write 
+  @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined 
+  as \mbox{@{text "{y | y \<approx> x}"}}.
+
+
+  Central to our proof will be the solution of equational systems
+  involving equivalence classes of languages. For this we will use Arden's Lemma \cite{Brzozowski64},
+  which solves equations of the form @{term "X = A ;; X \<union> B"} provided
+  @{term "[] \<notin> A"}. However we will need the following `reverse' 
+  version of Arden's Lemma (`reverse' in the sense of changing the order of @{term "A ;; X"} to
+  \mbox{@{term "X ;; A"}}).
+
+  \begin{lemma}[Reverse Arden's Lemma]\label{arden}\mbox{}\\
+  If @{thm (prem 1) arden} then
+  @{thm (lhs) arden} if and only if
+  @{thm (rhs) arden}.
+  \end{lemma}
+
+  \begin{proof}
+  For the right-to-left direction we assume @{thm (rhs) arden} and show
+  that @{thm (lhs) arden} holds. From Prop.~\ref{langprops}@{text "(i)"} 
+  we have @{term "A\<star> = {[]} \<union> A ;; A\<star>"},
+  which is equal to @{term "A\<star> = {[]} \<union> A\<star> ;; A"}. Adding @{text B} to both 
+  sides gives @{term "B ;; A\<star> = B ;; ({[]} \<union> A\<star> ;; A)"}, whose right-hand side
+  is equal to @{term "(B ;; A\<star>) ;; A \<union> B"}. This completes this direction. 
+
+  For the other direction we assume @{thm (lhs) arden}. By a simple induction
+  on @{text n}, we can establish the property
+
+  \begin{center}
+  @{text "(*)"}\hspace{5mm} @{thm (concl) arden_helper}
+  \end{center}
+  
+  \noindent
+  Using this property we can show that @{term "B ;; (A \<up> n) \<subseteq> X"} holds for
+  all @{text n}. From this we can infer @{term "B ;; A\<star> \<subseteq> X"} using the definition
+  of @{text "\<star>"}.
+  For the inclusion in the other direction we assume a string @{text s}
+  with length @{text k} is an element in @{text X}. Since @{thm (prem 1) arden}
+  we know by Prop.~\ref{langprops}@{text "(ii)"} that 
+  @{term "s \<notin> X ;; (A \<up> Suc k)"} since its length is only @{text k}
+  (the strings in @{term "X ;; (A \<up> Suc k)"} are all longer). 
+  From @{text "(*)"} it follows then that
+  @{term s} must be an element in @{term "(\<Union>m\<in>{0..k}. B ;; (A \<up> m))"}. This in turn
+  implies that @{term s} is in @{term "(\<Union>n. B ;; (A \<up> n))"}. Using Prop.~\ref{langprops}@{text "(iii)"} 
+  this is equal to @{term "B ;; A\<star>"}, as we needed to show.\qed
+  \end{proof}
+
+  \noindent
+  Regular expressions are defined as the inductive datatype
+
+  \begin{center}
+  @{text r} @{text "::="}
+  @{term NULL}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
+  @{term EMPTY}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
+  @{term "CHAR c"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
+  @{term "SEQ r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
+  @{term "ALT r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
+  @{term "STAR r"}
+  \end{center}
+
+  \noindent
+  and the language matched by a regular expression is defined as
+
+  \begin{center}
+  \begin{tabular}{c@ {\hspace{10mm}}c}
+  \begin{tabular}{rcl}
+  @{thm (lhs) L_rexp.simps(1)} & @{text "\<equiv>"} & @{thm (rhs) L_rexp.simps(1)}\\
+  @{thm (lhs) L_rexp.simps(2)} & @{text "\<equiv>"} & @{thm (rhs) L_rexp.simps(2)}\\
+  @{thm (lhs) L_rexp.simps(3)[where c="c"]} & @{text "\<equiv>"} & @{thm (rhs) L_rexp.simps(3)[where c="c"]}\\
+  \end{tabular}
+  &
+  \begin{tabular}{rcl}
+  @{thm (lhs) L_rexp.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]} & @{text "\<equiv>"} &
+       @{thm (rhs) L_rexp.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
+  @{thm (lhs) L_rexp.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]} & @{text "\<equiv>"} &
+       @{thm (rhs) L_rexp.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
+  @{thm (lhs) L_rexp.simps(6)[where r="r"]} & @{text "\<equiv>"} &
+      @{thm (rhs) L_rexp.simps(6)[where r="r"]}\\
+  \end{tabular}
+  \end{tabular}
+  \end{center}
+
+  Given a finite set of regular expressions @{text rs}, we will make use of the operation of generating 
+  a regular expression that matches the union of all languages of @{text rs}. We only need to know the 
+  existence
+  of such a regular expression and therefore we use Isabelle/HOL's @{const "fold_graph"} and Hilbert's
+  @{text "\<epsilon>"} to define @{term "\<Uplus>rs"}. This operation, roughly speaking, folds @{const ALT} over the 
+  set @{text rs} with @{const NULL} for the empty set. We can prove that for a finite set @{text rs}
+  %
+  \begin{equation}\label{uplus}
+  \mbox{@{thm (lhs) folds_alt_simp} @{text "= \<Union> (\<calL> ` rs)"}}
+  \end{equation}
+
+  \noindent
+  holds, whereby @{text "\<calL> ` rs"} stands for the 
+  image of the set @{text rs} under function @{text "\<calL>"}.
+*}
+
+
+section {* The Myhill-Nerode Theorem, First Part *}
+
+text {*
+  The key definition in the Myhill-Nerode theorem is the
+  \emph{Myhill-Nerode relation}, which states that w.r.t.~a language two 
+  strings are related, provided there is no distinguishing extension in this
+  language. This can be defined as a tertiary relation.
+
+  \begin{definition}[Myhill-Nerode Relation] Given a language @{text A}, two strings @{text x} and
+  @{text y} are Myhill-Nerode related provided
+  \begin{center}
+  @{thm str_eq_def[simplified str_eq_rel_def Pair_Collect]}
+  \end{center}
+  \end{definition}
+
+  \noindent
+  It is easy to see that @{term "\<approx>A"} is an equivalence relation, which
+  partitions the set of all strings, @{text "UNIV"}, into a set of disjoint
+  equivalence classes. To illustrate this quotient construction, let us give a simple 
+  example: consider the regular language containing just
+  the string @{text "[c]"}. The relation @{term "\<approx>({[c]})"} partitions @{text UNIV}
+  into three equivalence classes @{text "X\<^isub>1"}, @{text "X\<^isub>2"} and  @{text "X\<^isub>3"}
+  as follows
+  
+  \begin{center}
+  @{text "X\<^isub>1 = {[]}"}\hspace{5mm}
+  @{text "X\<^isub>2 = {[c]}"}\hspace{5mm}
+  @{text "X\<^isub>3 = UNIV - {[], [c]}"}
+  \end{center}
+
+  One direction of the Myhill-Nerode theorem establishes 
+  that if there are finitely many equivalence classes, like in the example above, then 
+  the language is regular. In our setting we therefore have to show:
+  
+  \begin{theorem}\label{myhillnerodeone}
+  @{thm[mode=IfThen] Myhill_Nerode1}
+  \end{theorem}
+
+  \noindent
+  To prove this theorem, we first define the set @{term "finals A"} as those equivalence
+  classes from @{term "UNIV // \<approx>A"} that contain strings of @{text A}, namely
+  %
+  \begin{equation} 
+  @{thm finals_def}
+  \end{equation}
+
+  \noindent
+  In our running example, @{text "X\<^isub>2"} is the only 
+  equivalence class in @{term "finals {[c]}"}.
+  It is straightforward to show that in general @{thm lang_is_union_of_finals} and 
+  @{thm finals_in_partitions} hold. 
+  Therefore if we know that there exists a regular expression for every
+  equivalence class in \mbox{@{term "finals A"}} (which by assumption must be
+  a finite set), then we can use @{text "\<bigplus>"} to obtain a regular expression 
+  that matches every string in @{text A}.
+
+
+  Our proof of Thm.~\ref{myhillnerodeone} relies on a method that can calculate a
+  regular expression for \emph{every} equivalence class, not just the ones 
+  in @{term "finals A"}. We
+  first define the notion of \emph{one-character-transition} between 
+  two equivalence classes
+  %
+  \begin{equation} 
+  @{thm transition_def}
+  \end{equation}
+
+  \noindent
+  which means that if we concatenate the character @{text c} to the end of all 
+  strings in the equivalence class @{text Y}, we obtain a subset of 
+  @{text X}. Note that we do not define an automaton here, we merely relate two sets
+  (with the help of a character). In our concrete example we have 
+  @{term "X\<^isub>1 \<Turnstile>c\<Rightarrow> X\<^isub>2"}, @{term "X\<^isub>1 \<Turnstile>d\<Rightarrow> X\<^isub>3"} with @{text d} being any 
+  other character than @{text c}, and @{term "X\<^isub>3 \<Turnstile>d\<Rightarrow> X\<^isub>3"} for any @{text d}.
+  
+  Next we construct an \emph{initial equational system} that
+  contains an equation for each equivalence class. We first give
+  an informal description of this construction. Suppose we have 
+  the equivalence classes @{text "X\<^isub>1,\<dots>,X\<^isub>n"}, there must be one and only one that
+  contains the empty string @{text "[]"} (since equivalence classes are disjoint).
+  Let us assume @{text "[] \<in> X\<^isub>1"}. We build the following equational system
+  
+  \begin{center}
+  \begin{tabular}{rcl}
+  @{text "X\<^isub>1"} & @{text "="} & @{text "(Y\<^isub>1\<^isub>1, CHAR c\<^isub>1\<^isub>1) + \<dots> + (Y\<^isub>1\<^isub>p, CHAR c\<^isub>1\<^isub>p) + \<lambda>(EMPTY)"} \\
+  @{text "X\<^isub>2"} & @{text "="} & @{text "(Y\<^isub>2\<^isub>1, CHAR c\<^isub>2\<^isub>1) + \<dots> + (Y\<^isub>2\<^isub>o, CHAR c\<^isub>2\<^isub>o)"} \\
+  & $\vdots$ \\
+  @{text "X\<^isub>n"} & @{text "="} & @{text "(Y\<^isub>n\<^isub>1, CHAR c\<^isub>n\<^isub>1) + \<dots> + (Y\<^isub>n\<^isub>q, CHAR c\<^isub>n\<^isub>q)"}\\
+  \end{tabular}
+  \end{center}
+
+  \noindent
+  where the terms @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"}
+  stand for all transitions @{term "Y\<^isub>i\<^isub>j \<Turnstile>c\<^isub>i\<^isub>j\<Rightarrow>
+  X\<^isub>i"}. 
+  %The intuition behind the equational system is that every 
+  %equation @{text "X\<^isub>i = rhs\<^isub>i"} in this system
+  %corresponds roughly to a state of an automaton whose name is @{text X\<^isub>i} and its predecessor states 
+  %are the @{text "Y\<^isub>i\<^isub>j"}; the @{text "c\<^isub>i\<^isub>j"} are the labels of the transitions from these 
+  %predecessor states to @{text X\<^isub>i}. 
+  There can only be
+  finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"} in a right-hand side 
+  since by assumption there are only finitely many
+  equivalence classes and only finitely many characters.
+  The term @{text "\<lambda>(EMPTY)"} in the first equation acts as a marker for the initial state, that
+  is the equivalence class
+  containing @{text "[]"}.\footnote{Note that we mark, roughly speaking, the
+  single `initial' state in the equational system, which is different from
+  the method by Brzozowski \cite{Brzozowski64}, where he marks the
+  `terminal' states. We are forced to set up the equational system in our
+  way, because the Myhill-Nerode relation determines the `direction' of the
+  transitions---the successor `state' of an equivalence class @{text Y} can
+  be reached by adding a character to the end of @{text Y}. This is also the
+  reason why we have to use our reverse version of Arden's Lemma.}
+  %In our initial equation system there can only be
+  %finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"} in a right-hand side 
+  %since by assumption there are only finitely many
+  %equivalence classes and only finitely many characters. 
+  Overloading the function @{text \<calL>} for the two kinds of terms in the
+  equational system, we have
+  
+  \begin{center}
+  @{text "\<calL>(Y, r) \<equiv>"} %
+  @{thm (rhs) L_rhs_trm.simps(2)[where X="Y" and r="r", THEN eq_reflection]}\hspace{10mm}
+  @{thm L_rhs_trm.simps(1)[where r="r", THEN eq_reflection]}
+  \end{center}
+
+  \noindent
+  and we can prove for @{text "X\<^isub>2\<^isub>.\<^isub>.\<^isub>n"} that the following equations
+  %
+  \begin{equation}\label{inv1}
+  @{text "X\<^isub>i = \<calL>(Y\<^isub>i\<^isub>1, CHAR c\<^isub>i\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>i\<^isub>q, CHAR c\<^isub>i\<^isub>q)"}.
+  \end{equation}
+
+  \noindent
+  hold. Similarly for @{text "X\<^isub>1"} we can show the following equation
+  %
+  \begin{equation}\label{inv2}
+  @{text "X\<^isub>1 = \<calL>(Y\<^isub>1\<^isub>1, CHAR c\<^isub>1\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>1\<^isub>p, CHAR c\<^isub>1\<^isub>p) \<union> \<calL>(\<lambda>(EMPTY))"}.
+  \end{equation}
+
+  \noindent
+  holds. The reason for adding the @{text \<lambda>}-marker to our initial equational system is 
+  to obtain this equation: it only holds with the marker, since none of 
+  the other terms contain the empty string. The point of the initial equational system is
+  that solving it means we will be able to extract a regular expression for every equivalence class. 
+
+  Our representation for the equations in Isabelle/HOL are pairs,
+  where the first component is an equivalence class (a set of strings)
+  and the second component
+  is a set of terms. Given a set of equivalence
+  classes @{text CS}, our initial equational system @{term "Init CS"} is thus 
+  formally defined as
+  %
+  \begin{equation}\label{initcs}
+  \mbox{\begin{tabular}{rcl}     
+  @{thm (lhs) Init_rhs_def} & @{text "\<equiv>"} & 
+  @{text "if"}~@{term "[] \<in> X"}\\
+  & & @{text "then"}~@{term "{Trn Y (CHAR c) | Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X} \<union> {Lam EMPTY}"}\\
+  & & @{text "else"}~@{term "{Trn Y (CHAR c)| Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X}"}\\
+  @{thm (lhs) Init_def}     & @{text "\<equiv>"} & @{thm (rhs) Init_def}
+  \end{tabular}}
+  \end{equation}
+
+  
+
+  \noindent
+  Because we use sets of terms 
+  for representing the right-hand sides of equations, we can 
+  prove \eqref{inv1} and \eqref{inv2} more concisely as
+  %
+  \begin{lemma}\label{inv}
+  If @{thm (prem 1) test} then @{text "X = \<Union> \<calL> ` rhs"}.
+  \end{lemma}
+
+  \noindent
+  Our proof of Thm.~\ref{myhillnerodeone} will proceed by transforming the
+  initial equational system into one in \emph{solved form} maintaining the invariant
+  in Lem.~\ref{inv}. From the solved form we will be able to read
+  off the regular expressions. 
+
+  In order to transform an equational system into solved form, we have two 
+  operations: one that takes an equation of the form @{text "X = rhs"} and removes
+  any recursive occurrences of @{text X} in the @{text rhs} using our variant of Arden's 
+  Lemma. The other operation takes an equation @{text "X = rhs"}
+  and substitutes @{text X} throughout the rest of the equational system
+  adjusting the remaining regular expressions appropriately. To define this adjustment 
+  we define the \emph{append-operation} taking a term and a regular expression as argument
+
+  \begin{center}
+  @{thm Append_rexp.simps(2)[where X="Y" and r="r\<^isub>1" and rexp="r\<^isub>2", THEN eq_reflection]}\hspace{10mm}
+  @{thm Append_rexp.simps(1)[where r="r\<^isub>1" and rexp="r\<^isub>2", THEN eq_reflection]}
+  \end{center}
+
+  \noindent
+  We lift this operation to entire right-hand sides of equations, written as
+  @{thm (lhs) Append_rexp_rhs_def[where rexp="r"]}. With this we can define
+  the \emph{arden-operation} for an equation of the form @{text "X = rhs"} as:
+  %
+  \begin{equation}\label{arden_def}
+  \mbox{\begin{tabular}{rc@ {\hspace{2mm}}r@ {\hspace{1mm}}l}
+  @{thm (lhs) Arden_def} & @{text "\<equiv>"}~~\mbox{} & \multicolumn{2}{@ {\hspace{-2mm}}l}{@{text "let"}}\\ 
+   & & @{text "rhs' ="} & @{term "rhs - {Trn X r | r. Trn X r \<in> rhs}"} \\
+   & & @{text "r' ="}   & @{term "STAR (\<Uplus> {r. Trn X r \<in> rhs})"}\\
+   & &  \multicolumn{2}{@ {\hspace{-2mm}}l}{@{text "in"}~~@{term "append_rhs_rexp rhs' r'"}}\\ 
+  \end{tabular}}
+  \end{equation}
+
+  \noindent
+  In this definition, we first delete all terms of the form @{text "(X, r)"} from @{text rhs};
+  then we calculate the combined regular expressions for all @{text r} coming 
+  from the deleted @{text "(X, r)"}, and take the @{const STAR} of it;
+  finally we append this regular expression to @{text rhs'}. It can be easily seen 
+  that this operation mimics Arden's Lemma on the level of equations. To ensure
+  the non-emptiness condition of Arden's Lemma we say that a right-hand side is
+  @{text ardenable} provided
+
+  \begin{center}
+  @{thm ardenable_def}
+  \end{center}
+
+  \noindent
+  This allows us to prove a version of Arden's Lemma on the level of equations.
+
+  \begin{lemma}\label{ardenable}
+  Given an equation @{text "X = rhs"}.
+  If @{text "X = \<Union>\<calL> ` rhs"},
+  @{thm (prem 2) Arden_keeps_eq}, and
+  @{thm (prem 3) Arden_keeps_eq}, then
+  @{text "X = \<Union>\<calL> ` (Arden X rhs)"}.
+  \end{lemma}
+  
+  \noindent
+  Our @{text ardenable} condition is slightly stronger than needed for applying Arden's Lemma,
+  but we can still ensure that it holds troughout our algorithm of transforming equations
+  into solved form. The \emph{substitution-operation} takes an equation
+  of the form @{text "X = xrhs"} and substitutes it into the right-hand side @{text rhs}.
+
+  \begin{center}
+  \begin{tabular}{rc@ {\hspace{2mm}}r@ {\hspace{1mm}}l}
+  @{thm (lhs) Subst_def} & @{text "\<equiv>"}~~\mbox{} & \multicolumn{2}{@ {\hspace{-2mm}}l}{@{text "let"}}\\ 
+   & & @{text "rhs' ="} & @{term "rhs - {Trn X r | r. Trn X r \<in> rhs}"} \\
+   & & @{text "r' ="}   & @{term "\<Uplus> {r. Trn X r \<in> rhs}"}\\
+   & &  \multicolumn{2}{@ {\hspace{-2mm}}l}{@{text "in"}~~@{term "rhs' \<union> append_rhs_rexp xrhs r'"}}\\ 
+  \end{tabular}
+  \end{center}
+
+  \noindent
+  We again delete first all occurrences of @{text "(X, r)"} in @{text rhs}; we then calculate
+  the regular expression corresponding to the deleted terms; finally we append this
+  regular expression to @{text "xrhs"} and union it up with @{text rhs'}. When we use
+  the substitution operation we will arrange it so that @{text "xrhs"} does not contain
+  any occurrence of @{text X}.
+
+  With these two operations in place, we can define the operation that removes one equation
+  from an equational systems @{text ES}. The operation @{const Subst_all}
+  substitutes an equation @{text "X = xrhs"} throughout an equational system @{text ES}; 
+  @{const Remove} then completely removes such an equation from @{text ES} by substituting 
+  it to the rest of the equational system, but first eliminating all recursive occurrences
+  of @{text X} by applying @{const Arden} to @{text "xrhs"}.
+
+  \begin{center}
+  \begin{tabular}{rcl}
+  @{thm (lhs) Subst_all_def} & @{text "\<equiv>"} & @{thm (rhs) Subst_all_def}\\
+  @{thm (lhs) Remove_def}    & @{text "\<equiv>"} & @{thm (rhs) Remove_def}
+  \end{tabular}
+  \end{center}
+
+  \noindent
+  Finally, we can define how an equational system should be solved. For this 
+  we will need to iterate the process of eliminating equations until only one equation
+  will be left in the system. However, we do not just want to have any equation
+  as being the last one, but the one involving the equivalence class for 
+  which we want to calculate the regular 
+  expression. Let us suppose this equivalence class is @{text X}. 
+  Since @{text X} is the one to be solved, in every iteration step we have to pick an
+  equation to be eliminated that is different from @{text X}. In this way 
+  @{text X} is kept to the final step. The choice is implemented using Hilbert's choice 
+  operator, written @{text SOME} in the definition below.
+  
+  \begin{center}
+  \begin{tabular}{rc@ {\hspace{4mm}}r@ {\hspace{1mm}}l}
+  @{thm (lhs) Iter_def} & @{text "\<equiv>"}~~\mbox{} & \multicolumn{2}{@ {\hspace{-4mm}}l}{@{text "let"}}\\ 
+   & & @{text "(Y, yrhs) ="} & @{term "SOME (Y, yrhs). (Y, yrhs) \<in> ES \<and> X \<noteq> Y"} \\
+   & &  \multicolumn{2}{@ {\hspace{-4mm}}l}{@{text "in"}~~@{term "Remove ES Y yrhs"}}\\ 
+  \end{tabular}
+  \end{center}
+
+  \noindent
+  The last definition we need applies @{term Iter} over and over until a condition 
+  @{text Cond} is \emph{not} satisfied anymore. This condition states that there
+  are more than one equation left in the equational system @{text ES}. To solve
+  an equational system we use Isabelle/HOL's @{text while}-operator as follows:
+  
+  \begin{center}
+  @{thm Solve_def}
+  \end{center}
+
+  \noindent
+  We are not concerned here with the definition of this operator
+  (see Berghofer and Nipkow \cite{BerghoferNipkow00}), but note that we eliminate
+  in each @{const Iter}-step a single equation, and therefore 
+  have a well-founded termination order by taking the cardinality 
+  of the equational system @{text ES}. This enables us to prove
+  properties about our definition of @{const Solve} when we `call' it with
+  the equivalence class @{text X} and the initial equational system 
+  @{term "Init (UNIV // \<approx>A)"} from
+  \eqref{initcs} using the principle:
+  %
+  \begin{equation}\label{whileprinciple}
+  \mbox{\begin{tabular}{l}
+  @{term "invariant (Init (UNIV // \<approx>A))"} \\
+  @{term "\<forall>ES. invariant ES \<and> Cond ES \<longrightarrow> invariant (Iter X ES)"}\\
+  @{term "\<forall>ES. invariant ES \<and> Cond ES \<longrightarrow> card (Iter X ES) < card ES"}\\
+  @{term "\<forall>ES. invariant ES \<and> \<not> Cond ES \<longrightarrow> P ES"}\\
+  \hline
+  \multicolumn{1}{c}{@{term "P (Solve X (Init (UNIV // \<approx>A)))"}}
+  \end{tabular}}
+  \end{equation}
+
+  \noindent
+  This principle states that given an invariant (which we will specify below) 
+  we can prove a property
+  @{text "P"} involving @{const Solve}. For this we have to discharge the following
+  proof obligations: first the
+  initial equational system satisfies the invariant; second the iteration
+  step @{text "Iter"} preserves the invariant as long as the condition @{term Cond} holds;
+  third @{text "Iter"} decreases the termination order, and fourth that
+  once the condition does not hold anymore then the property @{text P} must hold.
+
+  The property @{term P} in our proof will state that @{term "Solve X (Init (UNIV // \<approx>A))"}
+  returns with a single equation @{text "X = xrhs"} for some @{text "xrhs"}, and
+  that this equational system still satisfies the invariant. In order to get
+  the proof through, the invariant is composed of the following six properties:
+
+  \begin{center}
+  \begin{tabular}{@ {}rcl@ {\hspace{-13mm}}l @ {}}
+  @{text "invariant ES"} & @{text "\<equiv>"} &
+      @{term "finite ES"} & @{text "(finiteness)"}\\
+  & @{text "\<and>"} & @{thm (rhs) finite_rhs_def} & @{text "(finiteness rhs)"}\\
+  & @{text "\<and>"} & @{text "\<forall>(X, rhs)\<in>ES. X = \<Union>\<calL> ` rhs"} & @{text "(soundness)"}\\
+  & @{text "\<and>"} & @{thm (rhs) distinctness_def}\\
+  &             &  & @{text "(distinctness)"}\\
+  & @{text "\<and>"} & @{thm (rhs) ardenable_all_def} & @{text "(ardenable)"}\\   
+  & @{text "\<and>"} & @{thm (rhs) validity_def} & @{text "(validity)"}\\
+  \end{tabular}
+  \end{center}
+ 
+  \noindent
+  The first two ensure that the equational system is always finite (number of equations
+  and number of terms in each equation); the third makes sure the `meaning' of the 
+  equations is preserved under our transformations. The other properties are a bit more
+  technical, but are needed to get our proof through. Distinctness states that every
+  equation in the system is distinct. @{text Ardenable} ensures that we can always
+  apply the @{text Arden} operation. 
+  The last property states that every @{text rhs} can only contain equivalence classes
+  for which there is an equation. Therefore @{text lhss} is just the set containing 
+  the first components of an equational system,
+  while @{text "rhss"} collects all equivalence classes @{text X} in the terms of the 
+  form @{term "Trn X r"}. That means formally @{thm (lhs) lhss_def}~@{text "\<equiv> {X | (X, rhs) \<in> ES}"} 
+  and @{thm (lhs) rhss_def}~@{text "\<equiv> {X | (X, r) \<in> rhs}"}.
+  
+
+  It is straightforward to prove that the initial equational system satisfies the
+  invariant.
+
+  \begin{lemma}\label{invzero}
+  @{thm[mode=IfThen] Init_ES_satisfies_invariant}
+  \end{lemma}
+
+  \begin{proof}
+  Finiteness is given by the assumption and the way how we set up the 
+  initial equational system. Soundness is proved in Lem.~\ref{inv}. Distinctness
+  follows from the fact that the equivalence classes are disjoint. The @{text ardenable}
+  property also follows from the setup of the initial equational system, as does 
+  validity.\qed
+  \end{proof}
+
+  \noindent
+  Next we show that @{text Iter} preserves the invariant.
+
+  \begin{lemma}\label{iterone}
+  @{thm[mode=IfThen] iteration_step_invariant[where xrhs="rhs"]}
+  \end{lemma}
+
+  \begin{proof} 
+  The argument boils down to choosing an equation @{text "Y = yrhs"} to be eliminated
+  and to show that @{term "Subst_all (ES - {(Y, yrhs)}) Y (Arden Y yrhs)"} 
+  preserves the invariant.
+  We prove this as follows:
+
+  \begin{center}
+  @{text "\<forall> ES."} @{thm (prem 1) Subst_all_satisfies_invariant} implies
+  @{thm (concl) Subst_all_satisfies_invariant}
+  \end{center}
+
+  \noindent
+  Finiteness is straightforward, as the @{const Subst} and @{const Arden} operations 
+  keep the equational system finite. These operations also preserve soundness 
+  and distinctness (we proved soundness for @{const Arden} in Lem.~\ref{ardenable}).
+  The property @{text ardenable} is clearly preserved because the append-operation
+  cannot make a regular expression to match the empty string. Validity is
+  given because @{const Arden} removes an equivalence class from @{text yrhs}
+  and then @{const Subst_all} removes @{text Y} from the equational system.
+  Having proved the implication above, we can instantiate @{text "ES"} with @{text "ES - {(Y, yrhs)}"}
+  which matches with our proof-obligation of @{const "Subst_all"}. Since
+  \mbox{@{term "ES = ES - {(Y, yrhs)} \<union> {(Y, yrhs)}"}}, we can use the assumption 
+  to complete the proof.\qed 
+  \end{proof}
+
+  \noindent
+  We also need the fact that @{text Iter} decreases the termination measure.
+
+  \begin{lemma}\label{itertwo}
+  @{thm[mode=IfThen] iteration_step_measure[simplified (no_asm), where xrhs="rhs"]}
+  \end{lemma}
+
+  \begin{proof}
+  By assumption we know that @{text "ES"} is finite and has more than one element.
+  Therefore there must be an element @{term "(Y, yrhs) \<in> ES"} with 
+  @{term "(Y, yrhs) \<noteq> (X, rhs)"}. Using the distinctness property we can infer
+  that @{term "Y \<noteq> X"}. We further know that @{text "Remove ES Y yrhs"}
+  removes the equation @{text "Y = yrhs"} from the system, and therefore 
+  the cardinality of @{const Iter} strictly decreases.\qed
+  \end{proof}
+
+  \noindent
+  This brings us to our property we want to establish for @{text Solve}.
+
+
+  \begin{lemma}
+  If @{thm (prem 1) Solve} and @{thm (prem 2) Solve} then there exists
+  a @{text rhs} such that  @{term "Solve X (Init (UNIV // \<approx>A)) = {(X, rhs)}"}
+  and @{term "invariant {(X, rhs)}"}.
+  \end{lemma}
+
+  \begin{proof} 
+  In order to prove this lemma using \eqref{whileprinciple}, we have to use a slightly
+  stronger invariant since Lem.~\ref{iterone} and \ref{itertwo} have the precondition 
+  that @{term "(X, rhs) \<in> ES"} for some @{text rhs}. This precondition is needed
+  in order to choose in the @{const Iter}-step an equation that is not \mbox{@{term "X = rhs"}}.
+  Therefore our invariant cannot be just @{term "invariant ES"}, but must be 
+  @{term "invariant ES \<and> (\<exists>rhs. (X, rhs) \<in> ES)"}. By assumption 
+  @{thm (prem 2) Solve} and Lem.~\ref{invzero}, the more general invariant holds for
+  the initial equational system. This is premise 1 of~\eqref{whileprinciple}.
+  Premise 2 is given by Lem.~\ref{iterone} and the fact that @{const Iter} might
+  modify the @{text rhs} in the equation @{term "X = rhs"}, but does not remove it.
+  Premise 3 of~\eqref{whileprinciple} is by Lem.~\ref{itertwo}. Now in premise 4
+  we like to show that there exists a @{text rhs} such that @{term "ES = {(X, rhs)}"}
+  and that @{text "invariant {(X, rhs)}"} holds, provided the condition @{text "Cond"}
+  does not holds. By the stronger invariant we know there exists such a @{text "rhs"}
+  with @{term "(X, rhs) \<in> ES"}. Because @{text Cond} is not true, we know the cardinality
+  of @{text ES} is @{text 1}. This means @{text "ES"} must actually be the set @{text "{(X, rhs)}"},
+  for which the invariant holds. This allows us to conclude that 
+  @{term "Solve X (Init (UNIV // \<approx>A)) = {(X, rhs)}"} and @{term "invariant {(X, rhs)}"} hold,
+  as needed.\qed
+  \end{proof}
+
+  \noindent
+  With this lemma in place we can show that for every equivalence class in @{term "UNIV // \<approx>A"}
+  there exists a regular expression.
+
+  \begin{lemma}\label{every_eqcl_has_reg}
+  @{thm[mode=IfThen] every_eqcl_has_reg}
+  \end{lemma}
+
+  \begin{proof}
+  By the preceding lemma, we know that there exists a @{text "rhs"} such
+  that @{term "Solve X (Init (UNIV // \<approx>A))"} returns the equation @{text "X = rhs"},
+  and that the invariant holds for this equation. That means we 
+  know @{text "X = \<Union>\<calL> ` rhs"}. We further know that
+  this is equal to \mbox{@{text "\<Union>\<calL> ` (Arden X rhs)"}} using the properties of the 
+  invariant and Lem.~\ref{ardenable}. Using the validity property for the equation @{text "X = rhs"},
+  we can infer that @{term "rhss rhs \<subseteq> {X}"} and because the @{text Arden} operation
+  removes that @{text X} from @{text rhs}, that @{term "rhss (Arden X rhs) = {}"}.
+  This means the right-hand side @{term "Arden X rhs"} can only consist of terms of the form @{term "Lam r"}.
+  So we can collect those (finitely many) regular expressions @{text rs} and have @{term "X = L (\<Uplus>rs)"}.
+  With this we can conclude the proof.\qed
+  \end{proof}
+
+  \noindent
+  Lem.~\ref{every_eqcl_has_reg} allows us to finally give a proof for the first direction
+  of the Myhill-Nerode theorem.
+
+  \begin{proof}[of Thm.~\ref{myhillnerodeone}]
+  By Lem.~\ref{every_eqcl_has_reg} we know that there exists a regular expression for
+  every equivalence class in @{term "UNIV // \<approx>A"}. Since @{text "finals A"} is
+  a subset of  @{term "UNIV // \<approx>A"}, we also know that for every equivalence class
+  in @{term "finals A"} there exists a regular expression. Moreover by assumption 
+  we know that @{term "finals A"} must be finite, and therefore there must be a finite
+  set of regular expressions @{text "rs"} such that
+  @{term "\<Union>(finals A) = L (\<Uplus>rs)"}.
+  Since the left-hand side is equal to @{text A}, we can use @{term "\<Uplus>rs"} 
+  as the regular expression that is needed in the theorem.\qed
+  \end{proof}
+*}
+
+
+
+
+section {* Myhill-Nerode, Second Part *}
+
+text {*
+  We will prove in this section the second part of the Myhill-Nerode
+  theorem. It can be formulated in our setting as follows:
+
+  \begin{theorem}
+  Given @{text "r"} is a regular expression, then @{thm Myhill_Nerode2}.
+  \end{theorem}  
+
+  \noindent
+  The proof will be by induction on the structure of @{text r}. It turns out
+  the base cases are straightforward.
+
+
+  \begin{proof}[Base Cases]
+  The cases for @{const NULL}, @{const EMPTY} and @{const CHAR} are routine, because 
+  we can easily establish that
+
+  \begin{center}
+  \begin{tabular}{l}
+  @{thm quot_null_eq}\\
+  @{thm quot_empty_subset}\\
+  @{thm quot_char_subset}
+  \end{tabular}
+  \end{center}
+
+  \noindent
+  hold, which shows that @{term "UNIV // \<approx>(L r)"} must be finite.\qed
+  \end{proof}
+
+  \noindent
+  Much more interesting, however, are the inductive cases. They seem hard to solve 
+  directly. The reader is invited to try. 
+
+  Our proof will rely on some
+  \emph{tagging-functions} defined over strings. Given the inductive hypothesis, it will 
+  be easy to prove that the \emph{range} of these tagging-functions is finite
+  (the range of a function @{text f} is defined as @{text "range f \<equiv> f ` UNIV"}).
+  With this we will be able to infer that the tagging-functions, seen as relations,
+  give rise to finitely many equivalence classes of @{const UNIV}. Finally we 
+  will show that the tagging-relations are more refined than @{term "\<approx>(L r)"}, which
+  implies that @{term "UNIV // \<approx>(L r)"} must also be finite (a relation @{text "R\<^isub>1"} 
+  is said to \emph{refine} @{text "R\<^isub>2"} provided @{text "R\<^isub>1 \<subseteq> R\<^isub>2"}).
+  We formally define the notion of a \emph{tagging-relation} as follows.
+
+  \begin{definition}[Tagging-Relation] Given a tagging-function @{text tag}, then two strings @{text x}
+  and @{text y} are \emph{tag-related} provided
+  \begin{center}
+  @{text "x =tag= y \<equiv> tag x = tag y"}\;.
+  \end{center}
+  \end{definition}
+
+
+  In order to establish finiteness of a set @{text A}, we shall use the following powerful
+  principle from Isabelle/HOL's library.
+  %
+  \begin{equation}\label{finiteimageD}
+  @{thm[mode=IfThen] finite_imageD}
+  \end{equation}
+
+  \noindent
+  It states that if an image of a set under an injective function @{text f} (injective over this set) 
+  is finite, then the set @{text A} itself must be finite. We can use it to establish the following 
+  two lemmas.
+
+  \begin{lemma}\label{finone}
+  @{thm[mode=IfThen] finite_eq_tag_rel}
+  \end{lemma}
+
+  \begin{proof}
+  We set in \eqref{finiteimageD}, @{text f} to be @{text "X \<mapsto> tag ` X"}. We have
+  @{text "range f"} to be a subset of @{term "Pow (range tag)"}, which we know must be
+  finite by assumption. Now @{term "f (UNIV // =tag=)"} is a subset of @{text "range f"},
+  and so also finite. Injectivity amounts to showing that @{text "X = Y"} under the
+  assumptions that @{text "X, Y \<in> "}~@{term "UNIV // =tag="} and @{text "f X = f Y"}.
+  From the assumptions we can obtain @{text "x \<in> X"} and @{text "y \<in> Y"} with 
+  @{text "tag x = tag y"}. Since @{text x} and @{text y} are tag-related, this in 
+  turn means that the equivalence classes @{text X}
+  and @{text Y} must be equal.\qed
+  \end{proof}
+
+  \begin{lemma}\label{fintwo} 
+  Given two equivalence relations @{text "R\<^isub>1"} and @{text "R\<^isub>2"}, whereby
+  @{text "R\<^isub>1"} refines @{text "R\<^isub>2"}.
+  If @{thm (prem 1) refined_partition_finite[where ?R1.0="R\<^isub>1" and ?R2.0="R\<^isub>2"]}
+  then @{thm (concl) refined_partition_finite[where ?R1.0="R\<^isub>1" and ?R2.0="R\<^isub>2"]}.
+  \end{lemma}
+
+  \begin{proof}
+  We prove this lemma again using \eqref{finiteimageD}. This time we set @{text f} to
+  be @{text "X \<mapsto>"}~@{term "{R\<^isub>1 `` {x} | x. x \<in> X}"}. It is easy to see that 
+  @{term "finite (f ` (UNIV // R\<^isub>2))"} because it is a subset of @{term "Pow (UNIV // R\<^isub>1)"},
+  which is finite by assumption. What remains to be shown is that @{text f} is injective
+  on @{term "UNIV // R\<^isub>2"}. This is equivalent to showing that two equivalence 
+  classes, say @{text "X"} and @{text Y}, in @{term "UNIV // R\<^isub>2"} are equal, provided
+  @{text "f X = f Y"}. For @{text "X = Y"} to be equal, we have to find two elements
+  @{text "x \<in> X"} and @{text "y \<in> Y"} such that they are @{text R\<^isub>2} related.
+  We know there exists a @{text "x \<in> X"} with \mbox{@{term "X = R\<^isub>2 `` {x}"}}. 
+  From the latter fact we can infer that @{term "R\<^isub>1 ``{x} \<in> f X"}
+  and further @{term "R\<^isub>1 ``{x} \<in> f Y"}. This means we can obtain a @{text y}
+  such that @{term "R\<^isub>1 `` {x} = R\<^isub>1 `` {y}"} holds. Consequently @{text x} and @{text y}
+  are @{text "R\<^isub>1"}-related. Since by assumption @{text "R\<^isub>1"} refines @{text "R\<^isub>2"},
+  they must also be @{text "R\<^isub>2"}-related, as we need to show.\qed
+  \end{proof}
+
+  \noindent
+  Chaining Lem.~\ref{finone} and \ref{fintwo} together, means in order to show
+  that @{term "UNIV // \<approx>(L r)"} is finite, we have to find a tagging-function whose
+  range can be shown to be finite and whose tagging-relation refines @{term "\<approx>(L r)"}.
+  Let us attempt the @{const ALT}-case first.
+
+  \begin{proof}[@{const "ALT"}-Case] 
+  We take as tagging-function 
+  %
+  \begin{center}
+  @{thm tag_str_ALT_def[where A="A" and B="B", THEN meta_eq_app]}
+  \end{center}
+
+  \noindent
+  where @{text "A"} and @{text "B"} are some arbitrary languages.
+  We can show in general, if @{term "finite (UNIV // \<approx>A)"} and @{term "finite (UNIV // \<approx>B)"}
+  then @{term "finite ((UNIV // \<approx>A) \<times> (UNIV // \<approx>B))"} holds. The range of
+  @{term "tag_str_ALT A B"} is a subset of this product set---so finite. It remains to be shown
+  that @{text "=tag\<^isub>A\<^isub>L\<^isub>T A B="} refines @{term "\<approx>(A \<union> B)"}. This amounts to 
+  showing
+  %
+  \begin{center}
+  @{term "tag\<^isub>A\<^isub>L\<^isub>T A B x = tag\<^isub>A\<^isub>L\<^isub>T A B y \<longrightarrow> x \<approx>(A \<union> B) y"}
+  \end{center}
+  %
+  \noindent
+  which by unfolding the Myhill-Nerode relation is identical to
+  %
+  \begin{equation}\label{pattern}
+  @{text "\<forall>z. tag\<^isub>A\<^isub>L\<^isub>T A B x = tag\<^isub>A\<^isub>L\<^isub>T A B y \<and> x @ z \<in> A \<union> B \<longrightarrow> y @ z \<in> A \<union> B"}
+  \end{equation}
+  %
+  \noindent
+  since both @{text "=tag\<^isub>A\<^isub>L\<^isub>T A B="} and @{term "\<approx>(A \<union> B)"} are symmetric. To solve
+  \eqref{pattern} we just have to unfold the definition of the tagging-function and analyse 
+  in which set, @{text A} or @{text B}, the string @{term "x @ z"} is. 
+  The definition of the tagging-function will give us in each case the
+  information to infer that @{text "y @ z \<in> A \<union> B"}.
+  Finally we
+  can discharge this case by setting @{text A} to @{term "L r\<^isub>1"} and @{text B} to @{term "L r\<^isub>2"}.\qed
+  \end{proof}
+
+
+  \noindent
+  The pattern in \eqref{pattern} is repeated for the other two cases. Unfortunately,
+  they are slightly more complicated. In the @{const SEQ}-case we essentially have
+  to be able to infer that 
+  %
+  \begin{center}
+  @{text "\<dots>"}@{term "x @ z \<in> A ;; B \<longrightarrow> y @ z \<in> A ;; B"}
+  \end{center}
+  %
+  \noindent
+  using the information given by the appropriate tagging-function. The complication 
+  is to find out what the possible splits of @{text "x @ z"} are to be in @{term "A ;; B"}
+  (this was easy in case of @{term "A \<union> B"}). To deal with this complication we define the
+  notions of \emph{string prefixes} 
+  %
+  \begin{center}
+  @{text "x \<le> y \<equiv> \<exists>z. y = x @ z"}\hspace{10mm}
+  @{text "x < y \<equiv> x \<le> y \<and> x \<noteq> y"}
+  \end{center}
+  %
+  \noindent
+  and \emph{string subtraction}:
+  %
+  \begin{center}
+  @{text "[] - y \<equiv> []"}\hspace{10mm}
+  @{text "x - [] \<equiv> x"}\hspace{10mm}
+  @{text "cx - dy \<equiv> if c = d then x - y else cx"}
+  \end{center}
+  %
+  \noindent
+  where @{text c} and @{text d} are characters, and @{text x} and @{text y} are strings. 
+  
+  Now assuming  @{term "x @ z \<in> A ;; B"} there are only two possible ways of how to `split' 
+  this string to be in @{term "A ;; B"}:
+  %
+  \begin{center}
+  \begin{tabular}{@ {}c@ {\hspace{10mm}}c@ {}}
+  \scalebox{0.7}{
+  \begin{tikzpicture}
+    \node[draw,minimum height=3.8ex] (xa) { $\hspace{3em}@{text "x'"}\hspace{3em}$ };
+    \node[draw,minimum height=3.8ex, right=-0.03em of xa] (xxa) { $\hspace{0.2em}@{text "x - x'"}\hspace{0.2em}$ };
+    \node[draw,minimum height=3.8ex, right=-0.03em of xxa] (z) { $\hspace{5em}@{text z}\hspace{5em}$ };
+
+    \draw[decoration={brace,transform={yscale=3}},decorate]
+           (xa.north west) -- ($(xxa.north east)+(0em,0em)$)
+               node[midway, above=0.5em]{@{text x}};
+
+    \draw[decoration={brace,transform={yscale=3}},decorate]
+           (z.north west) -- ($(z.north east)+(0em,0em)$)
+               node[midway, above=0.5em]{@{text z}};
+
+    \draw[decoration={brace,transform={yscale=3}},decorate]
+           ($(xa.north west)+(0em,3ex)$) -- ($(z.north east)+(0em,3ex)$)
+               node[midway, above=0.8em]{@{term "x @ z \<in> A ;; B"}};
+
+    \draw[decoration={brace,transform={yscale=3}},decorate]
+           ($(z.south east)+(0em,0ex)$) -- ($(xxa.south west)+(0em,0ex)$)
+               node[midway, below=0.5em]{@{term "(x - x') @ z \<in> B"}};
+
+    \draw[decoration={brace,transform={yscale=3}},decorate]
+           ($(xa.south east)+(0em,0ex)$) -- ($(xa.south west)+(0em,0ex)$)
+               node[midway, below=0.5em]{@{term "x' \<in> A"}};
+  \end{tikzpicture}}
+  &
+  \scalebox{0.7}{
+  \begin{tikzpicture}
+    \node[draw,minimum height=3.8ex] (x) { $\hspace{4.8em}@{text x}\hspace{4.8em}$ };
+    \node[draw,minimum height=3.8ex, right=-0.03em of x] (za) { $\hspace{0.6em}@{text "z'"}\hspace{0.6em}$ };
+    \node[draw,minimum height=3.8ex, right=-0.03em of za] (zza) { $\hspace{2.6em}@{text "z - z'"}\hspace{2.6em}$  };
+
+    \draw[decoration={brace,transform={yscale=3}},decorate]
+           (x.north west) -- ($(za.north west)+(0em,0em)$)
+               node[midway, above=0.5em]{@{text x}};
+
+    \draw[decoration={brace,transform={yscale=3}},decorate]
+           ($(za.north west)+(0em,0ex)$) -- ($(zza.north east)+(0em,0ex)$)
+               node[midway, above=0.5em]{@{text z}};
+
+    \draw[decoration={brace,transform={yscale=3}},decorate]
+           ($(x.north west)+(0em,3ex)$) -- ($(zza.north east)+(0em,3ex)$)
+               node[midway, above=0.8em]{@{term "x @ z \<in> A ;; B"}};
+
+    \draw[decoration={brace,transform={yscale=3}},decorate]
+           ($(za.south east)+(0em,0ex)$) -- ($(x.south west)+(0em,0ex)$)
+               node[midway, below=0.5em]{@{text "x @ z' \<in> A"}};
+
+    \draw[decoration={brace,transform={yscale=3}},decorate]
+           ($(zza.south east)+(0em,0ex)$) -- ($(za.south east)+(0em,0ex)$)
+               node[midway, below=0.5em]{@{text "(z - z') \<in> B"}};
+  \end{tikzpicture}}
+  \end{tabular}
+  \end{center}
+  %
+  \noindent
+  Either there is a prefix of @{text x} in @{text A} and the rest is in @{text B} (first picture),
+  or @{text x} and a prefix of @{text "z"} is in @{text A} and the rest in @{text B} (second picture).
+  In both cases we have to show that @{term "y @ z \<in> A ;; B"}. For this we use the 
+  following tagging-function
+  %
+  \begin{center}
+  @{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B", THEN meta_eq_app]}
+  \end{center}
+
+  \noindent
+  with the idea that in the first split we have to make sure that @{text "(x - x') @ z"}
+  is in the language @{text B}.
+
+  \begin{proof}[@{const SEQ}-Case]
+  If @{term "finite (UNIV // \<approx>A)"} and @{term "finite (UNIV // \<approx>B)"}
+  then @{term "finite ((UNIV // \<approx>A) \<times> (Pow (UNIV // \<approx>B)))"} holds. The range of
+  @{term "tag_str_SEQ A B"} is a subset of this product set, and therefore finite.
+  We have to show injectivity of this tagging-function as
+  %
+  \begin{center}
+  @{term "\<forall>z. tag_str_SEQ A B x = tag_str_SEQ A B y \<and> x @ z \<in> A ;; B \<longrightarrow> y @ z \<in> A ;; B"}
+  \end{center}
+  %
+  \noindent
+  There are two cases to be considered (see pictures above). First, there exists 
+  a @{text "x'"} such that
+  @{text "x' \<in> A"}, @{text "x' \<le> x"} and @{text "(x - x') @ z \<in> B"} hold. We therefore have
+  %
+  \begin{center}
+  @{term "(\<approx>B `` {x - x'}) \<in> ({\<approx>B `` {x - x'} |x'. x' \<le> x \<and> x' \<in> A})"}
+  \end{center}
+  %
+  \noindent
+  and by the assumption about @{term "tag_str_SEQ A B"} also 
+  %
+  \begin{center}
+  @{term "(\<approx>B `` {x - x'}) \<in> ({\<approx>B `` {y - y'} |y'. y' \<le> y \<and> y' \<in> A})"}
+  \end{center}
+  %
+  \noindent
+  That means there must be a @{text "y'"} such that @{text "y' \<in> A"} and 
+  @{term "\<approx>B `` {x - x'} = \<approx>B `` {y - y'}"}. This equality means that
+  @{term "(x - x') \<approx>B (y - y')"} holds. Unfolding the Myhill-Nerode
+  relation and together with the fact that @{text "(x - x') @ z \<in> B"}, we
+  have @{text "(y - y') @ z \<in> B"}. We already know @{text "y' \<in> A"}, therefore
+  @{term "y @ z \<in> A ;; B"}, as needed in this case.
+
+  Second, there exists a @{text "z'"} such that @{term "x @ z' \<in> A"} and @{text "z - z' \<in> B"}.
+  By the assumption about @{term "tag_str_SEQ A B"} we have
+  @{term "\<approx>A `` {x} = \<approx>A `` {y}"} and thus @{term "x \<approx>A y"}. Which means by the Myhill-Nerode
+  relation that @{term "y @ z' \<in> A"} holds. Using @{text "z - z' \<in> B"}, we can conclude also in this case
+  with @{term "y @ z \<in> A ;; B"}. We again can complete the @{const SEQ}-case
+  by setting @{text A} to @{term "L r\<^isub>1"} and @{text B} to @{term "L r\<^isub>2"}.\qed 
+  \end{proof}
+  
+  \noindent
+  The case for @{const STAR} is similar to @{const SEQ}, but poses a few extra challenges. When
+  we analyse the case that @{text "x @ z"} is an element in @{term "A\<star>"} and @{text x} is not the 
+  empty string, we 
+  have the following picture:
+  %
+  \begin{center}
+  \scalebox{0.7}{
+  \begin{tikzpicture}
+    \node[draw,minimum height=3.8ex] (xa) { $\hspace{4em}@{text "x'\<^isub>m\<^isub>a\<^isub>x"}\hspace{4em}$ };
+    \node[draw,minimum height=3.8ex, right=-0.03em of xa] (xxa) { $\hspace{0.5em}@{text "x - x'\<^isub>m\<^isub>a\<^isub>x"}\hspace{0.5em}$ };
+    \node[draw,minimum height=3.8ex, right=-0.03em of xxa] (za) { $\hspace{2em}@{text "z\<^isub>a"}\hspace{2em}$ };
+    \node[draw,minimum height=3.8ex, right=-0.03em of za] (zb) { $\hspace{7em}@{text "z\<^isub>b"}\hspace{7em}$ };
+
+    \draw[decoration={brace,transform={yscale=3}},decorate]
+           (xa.north west) -- ($(xxa.north east)+(0em,0em)$)
+               node[midway, above=0.5em]{@{text x}};
+
+    \draw[decoration={brace,transform={yscale=3}},decorate]
+           (za.north west) -- ($(zb.north east)+(0em,0em)$)
+               node[midway, above=0.5em]{@{text z}};
+
+    \draw[decoration={brace,transform={yscale=3}},decorate]
+           ($(xa.north west)+(0em,3ex)$) -- ($(zb.north east)+(0em,3ex)$)
+               node[midway, above=0.8em]{@{term "x @ z \<in> A\<star>"}};
+
+    \draw[decoration={brace,transform={yscale=3}},decorate]
+           ($(za.south east)+(0em,0ex)$) -- ($(xxa.south west)+(0em,0ex)$)
+               node[midway, below=0.5em]{@{text "(x - x'\<^isub>m\<^isub>a\<^isub>x) @ z\<^isub>a \<in> A"}};
+
+    \draw[decoration={brace,transform={yscale=3}},decorate]
+           ($(xa.south east)+(0em,0ex)$) -- ($(xa.south west)+(0em,0ex)$)
+               node[midway, below=0.5em]{@{term "x'\<^isub>m\<^isub>a\<^isub>x \<in> A\<star>"}};
+
+    \draw[decoration={brace,transform={yscale=3}},decorate]
+           ($(zb.south east)+(0em,0ex)$) -- ($(zb.south west)+(0em,0ex)$)
+               node[midway, below=0.5em]{@{term "z\<^isub>b \<in> A\<star>"}};
+
+    \draw[decoration={brace,transform={yscale=3}},decorate]
+           ($(zb.south east)+(0em,-4ex)$) -- ($(xxa.south west)+(0em,-4ex)$)
+               node[midway, below=0.5em]{@{term "(x - x'\<^isub>m\<^isub>a\<^isub>x) @ z \<in> A\<star>"}};
+  \end{tikzpicture}}
+  \end{center}
+  %
+  \noindent
+  We can find a strict prefix @{text "x'"} of @{text x} such that @{term "x' \<in> A\<star>"},
+  @{text "x' < x"} and the rest @{term "(x - x') @ z \<in> A\<star>"}. For example the empty string 
+  @{text "[]"} would do.
+  There are potentially many such prefixes, but there can only be finitely many of them (the 
+  string @{text x} is finite). Let us therefore choose the longest one and call it 
+  @{text "x'\<^isub>m\<^isub>a\<^isub>x"}. Now for the rest of the string @{text "(x - x'\<^isub>m\<^isub>a\<^isub>x) @ z"} we
+  know it is in @{term "A\<star>"}. By definition of @{term "A\<star>"}, we can separate
+  this string into two parts, say @{text "a"} and @{text "b"}, such that @{text "a \<in> A"}
+  and @{term "b \<in> A\<star>"}. Now @{text a} must be strictly longer than @{text "x - x'\<^isub>m\<^isub>a\<^isub>x"},
+  otherwise @{text "x'\<^isub>m\<^isub>a\<^isub>x"} is not the longest prefix. That means @{text a}
+  `overlaps' with @{text z}, splitting it into two components @{text "z\<^isub>a"} and
+   @{text "z\<^isub>b"}. For this we know that @{text "(x - x'\<^isub>m\<^isub>a\<^isub>x) @ z\<^isub>a \<in> A"} and
+  @{term "z\<^isub>b \<in> A\<star>"}. To cut a story short, we have divided @{term "x @ z \<in> A\<star>"}
+  such that we have a string @{text a} with @{text "a \<in> A"} that lies just on the
+  `border' of @{text x} and @{text z}. This string is @{text "(x - x'\<^isub>m\<^isub>a\<^isub>x) @ z\<^isub>a"}.
+
+  In order to show that @{term "x @ z \<in> A\<star>"} implies @{term "y @ z \<in> A\<star>"}, we use
+  the following tagging-function:
+  %
+  \begin{center}
+  @{thm tag_str_STAR_def[where ?L1.0="A", THEN meta_eq_app]}\smallskip
+  \end{center}
+
+  \begin{proof}[@{const STAR}-Case]
+  If @{term "finite (UNIV // \<approx>A)"} 
+  then @{term "finite (Pow (UNIV // \<approx>A))"} holds. The range of
+  @{term "tag_str_STAR A"} is a subset of this set, and therefore finite.
+  Again we have to show injectivity of this tagging-function as  
+  %
+  \begin{center}
+  @{term "\<forall>z. tag_str_STAR A x = tag_str_STAR A y \<and> x @ z \<in> A\<star> \<longrightarrow> y @ z \<in> A\<star>"}
+  \end{center}  
+  %
+  \noindent
+  We first need to consider the case that @{text x} is the empty string.
+  From the assumption we can infer @{text y} is the empty string and
+  clearly have @{term "y @ z \<in> A\<star>"}. In case @{text x} is not the empty
+  string, we can divide the string @{text "x @ z"} as shown in the picture 
+  above. By the tagging-function we have
+  %
+  \begin{center}
+  @{term "\<approx>A `` {(x - x'\<^isub>m\<^isub>a\<^isub>x)} \<in> ({\<approx>A `` {x - x'} |x'. x' < x \<and> x' \<in> A\<star>})"}
+  \end{center}
+  %
+  \noindent
+  which by assumption is equal to
+  %
+  \begin{center}
+  @{term "\<approx>A `` {(x - x'\<^isub>m\<^isub>a\<^isub>x)} \<in> ({\<approx>A `` {y - y'} |y'. y' < y \<and> y' \<in> A\<star>})"}
+  \end{center}
+  %
+  \noindent 
+  and we know that we have a @{term "y' \<in> A\<star>"} and @{text "y' < y"}
+  and also know @{term "(x - x'\<^isub>m\<^isub>a\<^isub>x) \<approx>A (y - y')"}. Unfolding the Myhill-Nerode
+  relation we know @{term "(y - y') @ z\<^isub>a \<in> A"}. We also know that @{term "z\<^isub>b \<in> A\<star>"}.
+  Therefore @{term "y' @ ((y - y') @ z\<^isub>a) @ z\<^isub>b \<in> A\<star>"}, which means
+  @{term "y @ z \<in> A\<star>"}. As the last step we have to set @{text "A"} to @{term "L r"} and
+  complete the proof.\qed
+  \end{proof}
+*}
+
+section {* Second Part based on Partial Derivatives *}
+
+text {*
+   We briefly considered using the method Brzozowski presented in the
+  Appendix of~\cite{Brzozowski64} in order to prove the second
+  direction of the Myhill-Nerode theorem. There he calculates the
+  derivatives for regular expressions and shows that for every
+  language there can be only finitely many of them %derivations (if
+  regarded equal modulo ACI). We could have used as tagging-function
+  the set of derivatives of a regular expression with respect to a
+  language.  Using the fact that two strings are Myhill-Nerode related
+  whenever their derivative is the same, together with the fact that
+  there are only finitely such derivatives would give us a similar
+  argument as ours. However it seems not so easy to calculate the set
+  of derivatives modulo ACI. Therefore we preferred our direct method
+  of using tagging-functions. 
+
+*}
+
+section {* Closure Properties *}
+
+
+section {* Conclusion and Related Work *}
+
+text {*
+  In this paper we took the view that a regular language is one where there
+  exists a regular expression that matches all of its strings. Regular
+  expressions can conveniently be defined as a datatype in HOL-based theorem
+  provers. For us it was therefore interesting to find out how far we can push
+  this point of view. We have established in Isabelle/HOL both directions 
+  of the Myhill-Nerode theorem.
+  %
+  \begin{theorem}[The Myhill-Nerode Theorem]\mbox{}\\
+  A language @{text A} is regular if and only if @{thm (rhs) Myhill_Nerode}.
+  \end{theorem}
+  %
+  \noindent
+  Having formalised this theorem means we
+  pushed our point of view quite far. Using this theorem we can obviously prove when a language
+  is \emph{not} regular---by establishing that it has infinitely many
+  equivalence classes generated by the Myhill-Nerode relation (this is usually
+  the purpose of the pumping lemma \cite{Kozen97}).  We can also use it to
+  establish the standard textbook results about closure properties of regular
+  languages. Interesting is the case of closure under complement, because
+  it seems difficult to construct a regular expression for the complement
+  language by direct means. However the existence of such a regular expression
+  can be easily proved using the Myhill-Nerode theorem since 
+  %
+  \begin{center}
+  @{term "s\<^isub>1 \<approx>A s\<^isub>2"} if and only if @{term "s\<^isub>1 \<approx>(-A) s\<^isub>2"}
+  \end{center}
+  %
+  \noindent
+  holds for any strings @{text "s\<^isub>1"} and @{text
+  "s\<^isub>2"}. Therefore @{text A} and the complement language @{term "-A"} give rise to the same
+  partitions.  Proving the existence of such a regular expression via automata 
+  using the standard method would 
+  be quite involved. It includes the
+  steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text
+  "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"}
+  regular expression.
+
+  While regular expressions are convenient in formalisations, they have some
+  limitations. One is that there seems to be no method of calculating a
+  minimal regular expression (for example in terms of length) for a regular
+  language, like there is
+  for automata. On the other hand, efficient regular expression matching,
+  without using automata, poses no problem \cite{OwensReppyTuron09}.
+  For an implementation of a simple regular expression matcher,
+  whose correctness has been formally established, we refer the reader to
+  Owens and Slind \cite{OwensSlind08}.
+
+
+  Our formalisation consists of 780 lines of Isabelle/Isar code for the first
+  direction and 460 for the second, plus around 300 lines of standard material about
+  regular languages. While this might be seen as too large to count as a
+  concise proof pearl, this should be seen in the context of the work done by
+  Constable at al \cite{Constable00} who formalised the Myhill-Nerode theorem
+  in Nuprl using automata. They write that their four-member team needed
+  something on the magnitude of 18 months for their formalisation. The
+  estimate for our formalisation is that we needed approximately 3 months and
+  this included the time to find our proof arguments. Unlike Constable et al,
+  who were able to follow the proofs from \cite{HopcroftUllman69}, we had to
+  find our own arguments.  So for us the formalisation was not the
+  bottleneck. It is hard to gauge the size of a formalisation in Nurpl, but
+  from what is shown in the Nuprl Math Library about their development it
+  seems substantially larger than ours. The code of ours can be found in the
+  Mercurial Repository at
+  \mbox{\url{http://www4.in.tum.de/~urbanc/regexp.html}}.
+
+
+  Our proof of the first direction is very much inspired by \emph{Brzozowski's
+  algebraic method} used to convert a finite automaton to a regular
+  expression \cite{Brzozowski64}. The close connection can be seen by considering the equivalence
+  classes as the states of the minimal automaton for the regular language.
+  However there are some subtle differences. Since we identify equivalence
+  classes with the states of the automaton, then the most natural choice is to
+  characterise each state with the set of strings starting from the initial
+  state leading up to that state. Usually, however, the states are characterised as the
+  strings starting from that state leading to the terminal states.  The first
+  choice has consequences about how the initial equational system is set up. We have
+  the $\lambda$-term on our `initial state', while Brzozowski has it on the
+  terminal states. This means we also need to reverse the direction of Arden's
+  Lemma.
+
+  This is also where our method shines, because we can completely
+  side-step the standard argument \cite{Kozen97} where automata need
+  to be composed, which as stated in the Introduction is not so easy
+  to formalise in a HOL-based theorem prover. However, it is also the
+  direction where we had to spend most of the `conceptual' time, as
+  our proof-argument based on tagging-functions is new for
+  establishing the Myhill-Nerode theorem. All standard proofs of this
+  direction proceed by arguments over automata.\medskip
+  
+  \noindent
+  {\bf Acknowledgements:} We are grateful for the comments we received from Larry
+  Paulson.
+
+*}
+
+
+(*<*)
+end
+(*>*)
\ No newline at end of file