(*<*)
theory Paper
imports "../Myhill" "LaTeXsugar"
begin
declare [[show_question_marks = false]]
consts
REL :: "(string \<times> string) \<Rightarrow> bool"
UPLUS :: "'a set \<Rightarrow> 'a set \<Rightarrow> (nat \<times> 'a) set"
abbreviation
"EClass x R \<equiv> R `` {x}"
abbreviation
"append_rexp2 r_itm r \<equiv> append_rexp r r_itm"
notation (latex output)
str_eq_rel ("\<approx>\<^bsub>_\<^esub>") and
str_eq ("_ \<approx>\<^bsub>_\<^esub> _") and
Seq (infixr "\<cdot>" 100) and
Star ("_\<^bsup>\<star>\<^esup>") and
pow ("_\<^bsup>_\<^esup>" [100, 100] 100) and
Suc ("_+1" [100] 100) and
quotient ("_ \<^raw:\ensuremath{\!\sslash\!}> _" [90, 90] 90) and
REL ("\<approx>") and
UPLUS ("_ \<^raw:\ensuremath{\uplus}> _" [90, 90] 90) and
L ("\<^raw:\ensuremath{\cal{L}}>'(_')" [0] 101) and
Lam ("\<lambda>'(_')" [100] 100) and
Trn ("'(_, _')" [100, 100] 100) and
EClass ("\<lbrakk>_\<rbrakk>\<^bsub>_\<^esub>" [100, 100] 100) and
transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longmapsto}}> _" [100, 100, 100] 100) and
Setalt ("\<^raw:\ensuremath{\bigplus}>_" [1000] 999) and
append_rexp2 ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 100) and
append_rhs_rexp ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 50) and
uminus ("\<^raw:\ensuremath{\overline{>_\<^raw:}}>" [100] 100)
(*>*)
section {* Introduction *}
text {*
Regular languages are an important and well-understood subject in Computer
Science, with many beautiful theorems and many useful algorithms. There is a
wide range of textbooks on this subject, many of which are aimed at students
and contain very detailed ``pencil-and-paper'' proofs
(e.g.~\cite{Kozen97}). It seems natural to exercise theorem provers by
formalising the theorems and by verifying formally the algorithms.
There is however a problem: the typical approach to regular languages is to
introduce finite automata and then define everything in terms of them. For
example, a regular language is normally defined as one whose strings are
recognised by a finite deterministic automaton. This approach has many
benefits. Among them is the fact that it is easy to convince oneself that
regular languages are closed under complementation: one just has to exchange
the accepting and non-accepting states in the corresponding automaton to
obtain an automaton for the complement language. The problem, however, lies with
formalising such reasoning in a HOL-based theorem prover, in our case
Isabelle/HOL. Automata are build up from states and transitions that
need to be represented as graphs, matrices or functions, none
of which can be defined as inductive datatype.
In case of graphs and matrices, this means we have to build our own
reasoning infrastructure for them, as neither Isabelle/HOL nor HOL4 nor
HOLlight support them with libraries. Even worse, reasoning about graphs and
matrices can be a real hassle in HOL-based theorem provers. Consider for
example the operation of sequencing two automata, say $A_1$ and $A_2$, by
connecting the accepting states of $A_1$ to the initial state of $A_2$:
\begin{center}
\begin{tabular}{ccc}
\begin{tikzpicture}[scale=0.8]
%\draw[step=2mm] (-1,-1) grid (1,1);
\draw[rounded corners=1mm, very thick] (-1.0,-0.3) rectangle (-0.2,0.3);
\draw[rounded corners=1mm, very thick] ( 0.2,-0.3) rectangle ( 1.0,0.3);
\node (A) at (-1.0,0.0) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
\node (B) at ( 0.2,0.0) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
\node (C) at (-0.2, 0.13) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
\node (D) at (-0.2,-0.13) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
\node (E) at (1.0, 0.2) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
\node (F) at (1.0,-0.0) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
\node (G) at (1.0,-0.2) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
\draw (-0.6,0.0) node {\footnotesize$A_1$};
\draw ( 0.6,0.0) node {\footnotesize$A_2$};
\end{tikzpicture}
&
\raisebox{1.1mm}{\bf\Large$\;\;\;\Rightarrow\,\;\;$}
&
\begin{tikzpicture}[scale=0.8]
%\draw[step=2mm] (-1,-1) grid (1,1);
\draw[rounded corners=1mm, very thick] (-1.0,-0.3) rectangle (-0.2,0.3);
\draw[rounded corners=1mm, very thick] ( 0.2,-0.3) rectangle ( 1.0,0.3);
\node (A) at (-1.0,0.0) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
\node (B) at ( 0.2,0.0) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
\node (C) at (-0.2, 0.13) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
\node (D) at (-0.2,-0.13) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
\node (E) at (1.0, 0.2) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
\node (F) at (1.0,-0.0) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
\node (G) at (1.0,-0.2) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
\draw (C) to [very thick, bend left=45] (B);
\draw (D) to [very thick, bend right=45] (B);
\draw (-0.6,0.0) node {\footnotesize$A_1$};
\draw ( 0.6,0.0) node {\footnotesize$A_2$};
\end{tikzpicture}
\end{tabular}
\end{center}
\noindent
On ``paper'' we can define the corresponding graph in terms of the disjoint
union of the state nodes. Unfortunately in HOL, the standard definition for disjoint
union, namely
%
\begin{equation}\label{disjointunion}
@{term "UPLUS A\<^isub>1 A\<^isub>2 \<equiv> {(1, x) | x. x \<in> A\<^isub>1} \<union> {(2, y) | y. y \<in> A\<^isub>2}"}
\end{equation}
\noindent
changes the type---the disjoint union is not a set, but a set of pairs.
Using this definition for disjoint unions means we do not have a single type for automata
and hence will not be able to state certain properties about \emph{all}
automata, since there is no type quantification available in HOL. An
alternative, which provides us with a single type for automata, is to give every
state node an identity, for example a natural
number, and then be careful to rename these identities apart whenever
connecting two automata. This results in clunky proofs
establishing that properties are invariant under renaming. Similarly,
connecting two automata represented as matrices results in very adhoc
constructions, which are not pleasant to reason about.
Functions are much better supported in Isabelle/HOL, but they still lead to similar
problems as with graphs. Composing, for example, two non-deterministic automata in parallel
requires also the formalisation of disjoint unions. Nipkow \cite{Nipkow98}
dismisses for this the option of using identities, because it leads according to
him to ``messy proofs''. He
opts for a variant of \eqref{disjointunion} using bit lists, but writes
\begin{quote}
\it%
\begin{tabular}{@ {}l@ {}p{0.88\textwidth}@ {}}
`` & All lemmas appear obvious given a picture of the composition of automata\ldots
Yet their proofs require a painful amount of detail.''
\end{tabular}
\end{quote}
\noindent
and
\begin{quote}
\it%
\begin{tabular}{@ {}l@ {}p{0.88\textwidth}@ {}}
`` & If the reader finds the above treatment in terms of bit lists revoltingly
concrete, I cannot disagree. A more abstract approach is clearly desirable.''
\end{tabular}
\end{quote}
\noindent
Moreover, it is not so clear how to conveniently impose a finiteness condition
upon functions in order to represent \emph{finite} automata. The best is
probably to resort to more advanced reasoning frameworks, such as \emph{locales}
or \emph{type classes},
which are \emph{not} avaiable in all HOL-based theorem provers.
Because of these problems to do with representing automata, there seems
to be no substantial formalisation of automata theory and regular languages
carried out in HOL-based theorem provers. Nipkow establishes in
\cite{Nipkow98} the link between regular expressions and automata in
the context of lexing. The only larger formalisations of automata theory
are carried out in Nuprl \cite{Constable00} and in Coq (for example
\cite{Filliatre97}).
In this paper, we will not attempt to formalise automata theory in
Isabelle/HOL, but take a completely different approach to regular
languages. Instead of defining a regular language as one where there exists
an automaton that recognises all strings of the language, we define a
regular language as:
\begin{definition}
A language @{text A} is \emph{regular}, provided there is a regular expression that matches all
strings of @{text "A"}.
\end{definition}
\noindent
The reason is that regular expressions, unlike graphs, matrices and functions, can
be easily defined as inductive datatype. Consequently a corresponding reasoning
infrastructure comes for free. This has recently been exploited in HOL4 with a formalisation
of regular expression matching based on derivatives \cite{OwensSlind08} and
with an equivalence checker for regular expressions in Isabelle/HOL \cite{KraussNipkow11}.
The purpose of this paper is to
show that a central result about regular languages---the Myhill-Nerode theorem---can
be recreated by only using regular expressions. This theorem gives necessary
and sufficient conditions for when a language is regular. As a corollary of this
theorem we can easily establish the usual closure properties, including
complementation, for regular languages.\smallskip
\noindent
{\bf Contributions:}
There is an extensive literature on regular languages.
To our knowledge, our proof of the Myhill-Nerode theorem is the
first that is based on regular expressions, only. We prove the part of this theorem
stating that a regular expression has only finitely many partitions using certain
tagging-functions. Again to our best knowledge, these tagging functions have
not been used before to establish the Myhill-Nerode theorem.
*}
section {* Preliminaries *}
text {*
Strings in Isabelle/HOL are lists of characters with the \emph{empty string}
being represented by the empty list, written @{term "[]"}. \emph{Languages}
are sets of strings. The language containing all strings is written in
Isabelle/HOL as @{term "UNIV::string set"}. The concatenation of two languages
is written @{term "A ;; B"} and a language raised to the power @{text n} is written
@{term "A \<up> n"}. They are defined as usual
\begin{center}
@{thm Seq_def[THEN eq_reflection, where A1="A" and B1="B"]}
\hspace{7mm}
@{thm pow.simps(1)[THEN eq_reflection, where A1="A"]}
\hspace{7mm}
@{thm pow.simps(2)[THEN eq_reflection, where A1="A" and n1="n"]}
\end{center}
\noindent
where @{text "@"} is the usual list-append operation. The Kleene-star of a language @{text A}
is defined as the union over all powers, namely @{thm Star_def}. In the paper
we will make use of the following properties of these constructions.
\begin{proposition}\label{langprops}\mbox{}\\
\begin{tabular}{@ {}ll}
(i) & @{thm star_cases} \\
(ii) & @{thm[mode=IfThen] pow_length}\\
(iii) & @{thm seq_Union_left} \\
\end{tabular}
\end{proposition}
\noindent
In @{text "(ii)"} we use the notation @{term "length s"} for the length of a
string. This property states that if @{term "[] \<notin> A"} then the lengths of
the strings in @{term "A \<up> (Suc n)"} must be longer than @{text n}. We omit
the proofs for these properties, but invite the reader to consult our
formalisation.\footnote{Available at ???}
The notation in Isabelle/HOL for the quotient of a language @{text A} according to an
equivalence relation @{term REL} is @{term "A // REL"}. We will write
@{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined
as @{text "{y | y \<approx> x}"}.
Central to our proof will be the solution of equational systems
involving equivalence classes of languages. For this we will use Arden's lemma \cite{Brzozowski64}
which solves equations of the form @{term "X = A ;; X \<union> B"} provided
@{term "[] \<notin> A"}. However we will need the following ``reverse''
version of Arden's lemma.
\begin{lemma}[Reverse Arden's Lemma]\label{arden}\mbox{}\\
If @{thm (prem 1) arden} then
@{thm (lhs) arden} has the unique solution
@{thm (rhs) arden}.
\end{lemma}
\begin{proof}
For the right-to-left direction we assume @{thm (rhs) arden} and show
that @{thm (lhs) arden} holds. From Prop.~\ref{langprops}@{text "(i)"}
we have @{term "A\<star> = {[]} \<union> A ;; A\<star>"},
which is equal to @{term "A\<star> = {[]} \<union> A\<star> ;; A"}. Adding @{text B} to both
sides gives @{term "B ;; A\<star> = B ;; ({[]} \<union> A\<star> ;; A)"}, whose right-hand side
is equal to @{term "(B ;; A\<star>) ;; A \<union> B"}. This completes this direction.
For the other direction we assume @{thm (lhs) arden}. By a simple induction
on @{text n}, we can establish the property
\begin{center}
@{text "(*)"}\hspace{5mm} @{thm (concl) arden_helper}
\end{center}
\noindent
Using this property we can show that @{term "B ;; (A \<up> n) \<subseteq> X"} holds for
all @{text n}. From this we can infer @{term "B ;; A\<star> \<subseteq> X"} using the definition
of @{text "\<star>"}.
For the inclusion in the other direction we assume a string @{text s}
with length @{text k} is element in @{text X}. Since @{thm (prem 1) arden}
we know by Prop.~\ref{langprops}@{text "(ii)"} that
@{term "s \<notin> X ;; (A \<up> Suc k)"} since its length is only @{text k}
(the strings in @{term "X ;; (A \<up> Suc k)"} are all longer).
From @{text "(*)"} it follows then that
@{term s} must be element in @{term "(\<Union>m\<in>{0..k}. B ;; (A \<up> m))"}. This in turn
implies that @{term s} is in @{term "(\<Union>n. B ;; (A \<up> n))"}. Using Prop.~\ref{langprops}@{text "(iii)"}
this is equal to @{term "B ;; A\<star>"}, as we needed to show.\qed
\end{proof}
\noindent
Regular expressions are defined as the inductive datatype
\begin{center}
@{text r} @{text "::="}
@{term NULL}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
@{term EMPTY}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
@{term "CHAR c"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
@{term "SEQ r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
@{term "ALT r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
@{term "STAR r"}
\end{center}
\noindent
and the language matched by a regular expression is defined as
\begin{center}
\begin{tabular}{c@ {\hspace{10mm}}c}
\begin{tabular}{rcl}
@{thm (lhs) L_rexp.simps(1)} & @{text "\<equiv>"} & @{thm (rhs) L_rexp.simps(1)}\\
@{thm (lhs) L_rexp.simps(2)} & @{text "\<equiv>"} & @{thm (rhs) L_rexp.simps(2)}\\
@{thm (lhs) L_rexp.simps(3)[where c="c"]} & @{text "\<equiv>"} & @{thm (rhs) L_rexp.simps(3)[where c="c"]}\\
\end{tabular}
&
\begin{tabular}{rcl}
@{thm (lhs) L_rexp.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]} & @{text "\<equiv>"} &
@{thm (rhs) L_rexp.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
@{thm (lhs) L_rexp.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]} & @{text "\<equiv>"} &
@{thm (rhs) L_rexp.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
@{thm (lhs) L_rexp.simps(6)[where r="r"]} & @{text "\<equiv>"} &
@{thm (rhs) L_rexp.simps(6)[where r="r"]}\\
\end{tabular}
\end{tabular}
\end{center}
Given a finite set of regular expressions @{text rs}, we will make use of the operation of generating
a regular expression that matches all languages of @{text rs}. We only need to know the existence
of such a regular expression and therefore we use Isabelle/HOL's @{const "fold_graph"} and Hilbert's
@{text "\<epsilon>"} to define @{term "\<Uplus>rs"}. This operation, roughly speaking, folds @{const ALT} over the
set @{text rs} with @{const NULL} for the empty set. We can prove that for a finite set @{text rs}
%
\begin{equation}\label{uplus}
\mbox{@{thm (lhs) folds_alt_simp} @{text "= \<Union> (\<calL> ` rs)"}}
\end{equation}
\noindent
holds, whereby @{text "\<calL> ` rs"} stands for the
image of the set @{text rs} under function @{text "\<calL>"}.
*}
section {* The Myhill-Nerode Theorem, First Part *}
text {*
The key definition in the Myhill-Nerode theorem is the
\emph{Myhill-Nerode relation}, which states that w.r.t.~a language two
strings are related, provided there is no distinguishing extension in this
language. This can be defined as:
\begin{definition}[Myhill-Nerode Relation]\mbox{}\\
@{thm str_eq_def[simplified str_eq_rel_def Pair_Collect]}
\end{definition}
\noindent
It is easy to see that @{term "\<approx>A"} is an equivalence relation, which
partitions the set of all strings, @{text "UNIV"}, into a set of disjoint
equivalence classes. To illustrate this quotient construction, let us give a simple
example: consider the regular language containing just
the string @{text "[c]"}. The relation @{term "\<approx>({[c]})"} partitions @{text UNIV}
into three equivalence classes @{text "X\<^isub>1"}, @{text "X\<^isub>2"} and @{text "X\<^isub>3"}
as follows
\begin{center}
@{text "X\<^isub>1 = {[]}"}\hspace{5mm}
@{text "X\<^isub>2 = {[c]}"}\hspace{5mm}
@{text "X\<^isub>3 = UNIV - {[], [c]}"}
\end{center}
One direction of the Myhill-Nerode theorem establishes
that if there are finitely many equivalence classes, like in the example above, then
the language is regular. In our setting we therefore have to show:
\begin{theorem}\label{myhillnerodeone}
@{thm[mode=IfThen] Myhill_Nerode1}
\end{theorem}
\noindent
To prove this theorem, we first define the set @{term "finals A"} as those equivalence
classes from @{term "UNIV // \<approx>A"} that contain strings of @{text A}, namely
%
\begin{equation}
@{thm finals_def}
\end{equation}
\noindent
In our running example, @{text "X\<^isub>2"} is the only equivalence class in @{term "finals {[c]}"}.
It is straightforward to show that in general @{thm lang_is_union_of_finals} and
@{thm finals_in_partitions} hold.
Therefore if we know that there exists a regular expression for every
equivalence class in \mbox{@{term "finals A"}} (which by assumption must be
a finite set), then we can use @{text "\<bigplus>"} to obtain a regular expression
that matches every string in @{text A}.
Our proof of Thm.~\ref{myhillnerodeone} relies on a method that can calculate a
regular expression for \emph{every} equivalence class, not just the ones
in @{term "finals A"}. We
first define the notion of \emph{one-character-transition} between
two equivalence classes
%
\begin{equation}
@{thm transition_def}
\end{equation}
\noindent
which means that if we concatenate the character @{text c} to the end of all
strings in the equivalence class @{text Y}, we obtain a subset of
@{text X}. Note that we do not define an automaton here, we merely relate two sets
(with the help of a character). In our concrete example we have
@{term "X\<^isub>1 \<Turnstile>c\<Rightarrow> X\<^isub>2"}, @{term "X\<^isub>1 \<Turnstile>d\<Rightarrow> X\<^isub>3"} with @{text d} being any
other character than @{text c}, and @{term "X\<^isub>3 \<Turnstile>d\<Rightarrow> X\<^isub>3"} for any @{text d}.
Next we build an \emph{initial equational system} that
contains an equation for each equivalence class. Suppose we have
the equivalence classes @{text "X\<^isub>1,\<dots>,X\<^isub>n"}, there must be one and only one that
contains the empty string @{text "[]"} (since equivalence classes are disjoint).
Let us assume @{text "[] \<in> X\<^isub>1"}. We build the following equational system
\begin{center}
\begin{tabular}{rcl}
@{text "X\<^isub>1"} & @{text "="} & @{text "(Y\<^isub>1\<^isub>1, CHAR c\<^isub>1\<^isub>1) + \<dots> + (Y\<^isub>1\<^isub>p, CHAR c\<^isub>1\<^isub>p) + \<lambda>(EMPTY)"} \\
@{text "X\<^isub>2"} & @{text "="} & @{text "(Y\<^isub>2\<^isub>1, CHAR c\<^isub>2\<^isub>1) + \<dots> + (Y\<^isub>2\<^isub>o, CHAR c\<^isub>2\<^isub>o)"} \\
& $\vdots$ \\
@{text "X\<^isub>n"} & @{text "="} & @{text "(Y\<^isub>n\<^isub>1, CHAR c\<^isub>n\<^isub>1) + \<dots> + (Y\<^isub>n\<^isub>q, CHAR c\<^isub>n\<^isub>q)"}\\
\end{tabular}
\end{center}
\noindent
where the terms @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"}
stand for all transitions @{term "Y\<^isub>i\<^isub>j \<Turnstile>c\<^isub>i\<^isub>j\<Rightarrow>
X\<^isub>i"}. There can only be
finitely many such terms in a right-hand side since by assumption there are only finitely many
equivalence classes and only finitely many characters. The term @{text
"\<lambda>(EMPTY)"} in the first equation acts as a marker for the equivalence class
containing @{text "[]"}.\footnote{Note that we mark, roughly speaking, the
single ``initial'' state in the equational system, which is different from
the method by Brzozowski \cite{Brzozowski64}, where he marks the
``terminal'' states. We are forced to set up the equational system in our
way, because the Myhill-Nerode relation determines the ``direction'' of the
transitions. The successor ``state'' of an equivalence class @{text Y} can
be reached by adding characters to the end of @{text Y}. This is also the
reason why we have to use our reverse version of Arden's lemma.}
Overloading the function @{text \<calL>} for the two kinds of terms in the
equational system, we have
\begin{center}
@{text "\<calL>(Y, r) \<equiv>"} %
@{thm (rhs) L_rhs_item.simps(2)[where X="Y" and r="r", THEN eq_reflection]}\hspace{10mm}
@{thm L_rhs_item.simps(1)[where r="r", THEN eq_reflection]}
\end{center}
\noindent
and we can prove for @{text "X\<^isub>2\<^isub>.\<^isub>.\<^isub>n"} that the following equations
%
\begin{equation}\label{inv1}
@{text "X\<^isub>i = \<calL>(Y\<^isub>i\<^isub>1, CHAR c\<^isub>i\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>i\<^isub>q, CHAR c\<^isub>i\<^isub>q)"}.
\end{equation}
\noindent
hold. Similarly for @{text "X\<^isub>1"} we can show the following equation
%
\begin{equation}\label{inv2}
@{text "X\<^isub>1 = \<calL>(Y\<^isub>i\<^isub>1, CHAR c\<^isub>i\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>i\<^isub>p, CHAR c\<^isub>i\<^isub>p) \<union> \<calL>(\<lambda>(EMPTY))"}.
\end{equation}
\noindent
The reason for adding the @{text \<lambda>}-marker to our initial equational system is
to obtain this equation: it only holds with the marker, since none of
the other terms contain the empty string. The point of the initial equational system is
that solving it means we will be able to extract a regular expression for every equivalence class.
Our representation for the equations in Isabelle/HOL are pairs,
where the first component is an equivalence class (a set of strings)
and the second component
is a set of terms. Given a set of equivalence
classes @{text CS}, our initial equational system @{term "Init CS"} is thus
formally defined as
%
\begin{equation}\label{initcs}
\mbox{\begin{tabular}{rcl}
@{thm (lhs) Init_rhs_def} & @{text "\<equiv>"} &
@{text "if"}~@{term "[] \<in> X"}\\
& & @{text "then"}~@{term "{Trn Y (CHAR c) | Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X} \<union> {Lam EMPTY}"}\\
& & @{text "else"}~@{term "{Trn Y (CHAR c)| Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X}"}\\
@{thm (lhs) Init_def} & @{text "\<equiv>"} & @{thm (rhs) Init_def}
\end{tabular}}
\end{equation}
\noindent
Because we use sets of terms
for representing the right-hand sides of equations, we can
prove \eqref{inv1} and \eqref{inv2} more concisely as
%
\begin{lemma}\label{inv}
If @{thm (prem 1) test} then @{text "X = \<Union> \<calL> ` rhs"}.
\end{lemma}
\noindent
Our proof of Thm.~\ref{myhillnerodeone} will proceed by transforming the
initial equational system into one in \emph{solved form} maintaining the invariant
in Lem.~\ref{inv}. From the solved form we will be able to read
off the regular expressions.
In order to transform an equational system into solved form, we have two
operations: one that takes an equation of the form @{text "X = rhs"} and removes
any recursive occurrences of @{text X} in the @{text rhs} using our variant of Arden's
Lemma. The other operation takes an equation @{text "X = rhs"}
and substitutes @{text X} throughout the rest of the equational system
adjusting the remaining regular expressions appropriately. To define this adjustment
we define the \emph{append-operation} taking a term and a regular expression as argument
\begin{center}
@{thm append_rexp.simps(2)[where X="Y" and r="r\<^isub>1" and rexp="r\<^isub>2", THEN eq_reflection]}\hspace{10mm}
@{thm append_rexp.simps(1)[where r="r\<^isub>1" and rexp="r\<^isub>2", THEN eq_reflection]}
\end{center}
\noindent
We lift this operation to entire right-hand sides of equations, written as
@{thm (lhs) append_rhs_rexp_def[where rexp="r"]}. With this we can define
the \emph{arden-operation} for an equation of the form @{text "X = rhs"} as:
%
\begin{equation}\label{arden_def}
\mbox{\begin{tabular}{rc@ {\hspace{2mm}}r@ {\hspace{1mm}}l}
@{thm (lhs) Arden_def} & @{text "\<equiv>"}~~\mbox{} & \multicolumn{2}{@ {\hspace{-2mm}}l}{@{text "let"}}\\
& & @{text "rhs' ="} & @{term "rhs - {Trn X r | r. Trn X r \<in> rhs}"} \\
& & @{text "r' ="} & @{term "STAR (\<Uplus> {r. Trn X r \<in> rhs})"}\\
& & \multicolumn{2}{@ {\hspace{-2mm}}l}{@{text "in"}~~@{term "append_rhs_rexp rhs' r'"}}\\
\end{tabular}}
\end{equation}
\noindent
In this definition, we first delete all terms of the form @{text "(X, r)"} from @{text rhs};
then we calculate the combined regular expressions for all @{text r} coming
from the deleted @{text "(X, r)"}, and take the @{const STAR} of it;
finally we append this regular expression to @{text rhs'}. It can be easily seen
that this operation mimics Arden's lemma on the level of equations. To ensure
the non-emptiness condition of Arden's lemma we say that a right-hand side is
\emph{ardenable} provided
\begin{center}
@{thm ardenable_def}
\end{center}
\noindent
This allows us to prove
\begin{lemma}\label{ardenable}
If @{text "X = \<Union>\<calL> ` rhs"},
@{thm (prem 2) Arden_keeps_eq} and
@{thm (prem 3) Arden_keeps_eq}, then
@{text "X = \<Union>\<calL> ` (Arden X rhs)"}
\end{lemma}
\noindent
The \emph{substituion-operation} takes an equation
of the form @{text "X = xrhs"} and substitutes it into the right-hand side @{text rhs}.
\begin{center}
\begin{tabular}{rc@ {\hspace{2mm}}r@ {\hspace{1mm}}l}
@{thm (lhs) Subst_def} & @{text "\<equiv>"}~~\mbox{} & \multicolumn{2}{@ {\hspace{-2mm}}l}{@{text "let"}}\\
& & @{text "rhs' ="} & @{term "rhs - {Trn X r | r. Trn X r \<in> rhs}"} \\
& & @{text "r' ="} & @{term "\<Uplus> {r. Trn X r \<in> rhs}"}\\
& & \multicolumn{2}{@ {\hspace{-2mm}}l}{@{text "in"}~~@{term "rhs' \<union> append_rhs_rexp xrhs r'"}}\\
\end{tabular}
\end{center}
\noindent
We again delete first all occurrence of @{text "(X, r)"} in @{text rhs}; we then calculate
the regular expression corresponding to the deleted terms; finally we append this
regular expression to @{text "xrhs"} and union it up with @{text rhs'}. When we use
the substitution operation we will arrange it so that @{text "xrhs"} does not contain
any occurrence of @{text X}.
With these two operation in place, we can define the operation that removes one equation
from an equational systems @{text ES}. The operation @{const Subst_all}
substitutes an equation @{text "X = xrhs"} throughout an equational system @{text ES};
@{const Remove} then completely removes such an equation from @{text ES} by substituting
it to the rest of the equational system, but first eliminating all recursive occurrences
of @{text X} by applying @{const Arden} to @{text "xrhs"}.
\begin{center}
\begin{tabular}{rcl}
@{thm (lhs) Subst_all_def} & @{text "\<equiv>"} & @{thm (rhs) Subst_all_def}\\
@{thm (lhs) Remove_def} & @{text "\<equiv>"} & @{thm (rhs) Remove_def}
\end{tabular}
\end{center}
\noindent
Finally, we can define how an equational system should be solved. For this
we will need to iterate the process of eliminating equations until only one equation
will be left in the system. However, we not just want to have any equation
as being the last one, but the one involving the equivalence class for
which we want to calculate the regular
expression. Let us suppose this equivalence class is @{text X}.
Since @{text X} is the one to be solved, in every iteration step we have to pick an
equation to be eliminated that is different from @{text X}. In this way
@{text X} is kept to the final step. The choice is implemented using Hilbert's choice
operator, written @{text SOME} in the definition below.
\begin{center}
\begin{tabular}{rc@ {\hspace{4mm}}r@ {\hspace{1mm}}l}
@{thm (lhs) Iter_def} & @{text "\<equiv>"}~~\mbox{} & \multicolumn{2}{@ {\hspace{-4mm}}l}{@{text "let"}}\\
& & @{text "(Y, yrhs) ="} & @{term "SOME (Y, yrhs). (Y, yrhs) \<in> ES \<and> X \<noteq> Y"} \\
& & \multicolumn{2}{@ {\hspace{-4mm}}l}{@{text "in"}~~@{term "Remove ES Y yrhs"}}\\
\end{tabular}
\end{center}
\noindent
The last definition we need applies @{term Iter} over and over until a condition
@{text Cond} is \emph{not} satisfied anymore. The condition states that there
are more than one equation left in the equational system @{text ES}. To solve
an equational system we use Isabelle/HOL's @{text while}-operator as follows:
\begin{center}
@{thm Solve_def}
\end{center}
\noindent
We are not concerned here with the definition of this operator
(see \cite{BerghoferNipkow00}), but note that we eliminate
in each @{const Iter}-step a single equation, and therefore
have a well-founded termination order by taking the cardinality
of the equational system @{text ES}. This enables us to prove
properties about our definition of @{const Solve} when we ``call'' it with
the equivalence class @{text X} and the initial equational system
@{term "Init (UNIV // \<approx>A)"} from
\eqref{initcs} using the principle:
%
\begin{equation}\label{whileprinciple}
\mbox{\begin{tabular}{l}
@{term "invariant (Init (UNIV // \<approx>A))"} \\
@{term "\<forall>ES. invariant ES \<and> Cond ES \<longrightarrow> invariant (Iter X ES)"}\\
@{term "\<forall>ES. invariant ES \<and> Cond ES \<longrightarrow> card (Iter X ES) < card ES"}\\
@{term "\<forall>ES. invariant ES \<and> \<not> Cond ES \<longrightarrow> P ES"}\\
\hline
\multicolumn{1}{c}{@{term "P (Solve X (Init (UNIV // \<approx>A)))"}}
\end{tabular}}
\end{equation}
\noindent
This principle states that given an invariant (which we will specify below)
we can prove a property
@{text "P"} involving @{const Solve}. For this we have to discharge the following
proof obligations: first the
initial equational system satisfies the invariant; second that the iteration
step @{text "Iter"} preserves the the invariant as long as the condition @{term Cond} holds;
third that @{text "Iter"} decreases the termination order, and fourth that
once the condition does not hold anymore then the property @{text P} must hold.
The property @{term P} in our proof will state that @{term "Solve X (Init (UNIV // \<approx>A))"}
returns with a single equation @{text "X = xrhs"} for some @{text "xrhs"}, and
that this equational system still satisfies the invariant. In order to get
the proof through, the invariant is composed of the following six properties:
\begin{center}
\begin{tabular}{@ {}rcl@ {\hspace{-13mm}}l @ {}}
@{text "invariant ES"} & @{text "\<equiv>"} &
@{term "finite ES"} & @{text "(finiteness)"}\\
& @{text "\<and>"} & @{thm (rhs) finite_rhs_def} & @{text "(finiteness rhs)"}\\
& @{text "\<and>"} & @{text "\<forall>(X, rhs)\<in>ES. X = \<Union>\<calL> ` rhs"} & @{text "(soundness)"}\\
& @{text "\<and>"} & @{thm (rhs) distinct_equas_def}\\
& & & @{text "(distinctness)"}\\
& @{text "\<and>"} & @{thm (rhs) ardenable_all_def} & @{text "(ardenable)"}\\
& @{text "\<and>"} & @{thm (rhs) valid_eqs_def} & @{text "(validity)"}\\
\end{tabular}
\end{center}
\noindent
The first two ensure that the equational system is always finite (number of equations
and number of terms in each equation); the second makes sure the ``meaning'' of the
equations is preserved under our transformations. The other properties are a bit more
technical, but are needed to get our proof through. Distinctness states that every
equation in the system is distinct. Ardenable ensures that we can always
apply the arden operation.
The last property states that every @{text rhs} can only contain equivalence classes
for which there is an equation. Therefore @{text lhss} is just the set containing
the first components of an equational system,
while @{text "rhss"} collects all equivalence classes @{text X} in the terms of the
form @{term "Trn X r"}. That means @{thm (lhs) lhss_def}~@{text "\<equiv> {X | (X, rhs) \<in> ES}"}
and @{thm (lhs) rhss_def}~@{text "\<equiv> {X | (X, r) \<in> rhs}"}.
It is straightforward to prove that the initial equational system satisfies the
invariant.
\begin{lemma}\label{invzero}
@{thm[mode=IfThen] Init_ES_satisfies_invariant}
\end{lemma}
\begin{proof}
Finiteness is given by the assumption and the way how we set up the
initial equational system. Soundness is proved in Lem.~\ref{inv}. Distinctness
follows from the fact that the equivalence classes are disjoint. The ardenable
property also follows from the setup of the equational system, as does
validity.\qed
\end{proof}
\begin{lemma}\label{iterone}
@{thm[mode=IfThen] iteration_step_invariant[where xrhs="rhs"]}
\end{lemma}
\begin{proof}
This boils down to choosing an equation @{text "Y = yrhs"} to be eliminated
and to show that @{term "Subst_all (ES - {(Y, yrhs)}) Y (Arden Y yrhs)"}
preserves the invariant.
We prove this as follows:
\begin{center}
@{text "\<forall> ES."} @{thm (prem 1) Subst_all_satisfies_invariant} implies
@{thm (concl) Subst_all_satisfies_invariant}
\end{center}
\noindent
Finiteness is straightforward, as @{const Subst} and @{const Arden} operations
keep the equational system finite. These operation also preserve soundness
distinctness (we proved soundness for @{const Arden} in Lem.~\ref{ardenable}).
The property Ardenable is clearly preserved because the append-operation
cannot make a regular expression to match the empty string. Validity is
given because @{const Arden} removes an equivalence class from @{text yrhs}
and then @{const Subst_all} removes @{text Y} from the equational system.
Having proved the implication above, we can replace @{text "ES"} with @{text "ES - {(Y, yrhs)}"}
which matches with our proof-obligation of @{const "Subst_all"}. Since
\mbox{@{term "ES = ES - {(Y, yrhs)} \<union> {(Y, yrhs)}"}}, we can use our assumption
to complete the proof.\qed
\end{proof}
\begin{lemma}\label{itertwo}
@{thm[mode=IfThen] iteration_step_measure[simplified (no_asm), where xrhs="rhs"]}
\end{lemma}
\begin{proof}
By assumption we know that @{text "ES"} is finite and has more than one element.
Therefore there must be an element @{term "(Y, yrhs) \<in> ES"} with
@{term "(Y, yrhs) \<noteq> (X, rhs)"}. Using the distinctness property we can infer
that @{term "Y \<noteq> X"}. We further know that @{text "Remove ES Y yrhs"}
removes the equation @{text "Y = yrhs"} from the system, and therefore
the cardinality of @{const Iter} strictly decreases.\qed
\end{proof}
\begin{lemma}
If @{thm (prem 1) Solve} and @{thm (prem 2) Solve} then there exists
a @{text rhs} such that @{term "Solve X (Init (UNIV // \<approx>A)) = {(X, rhs)}"}
and @{term "invariant {(X, rhs)}"}.
\end{lemma}
\begin{proof}
In order to prove this lemma using \eqref{whileprinciple}, we have to use a slightly
stronger invariant since Lem.~\ref{iterone} and \ref{itertwo} have the precondition
that @{term "(X, rhs) \<in> ES"} for some @{text rhs}. This precondition is needed
in order to choose in the @{const Iter}-step an equation that is not \mbox{@{term "X = rhs"}}.
Therefore our invariant is cannot be just @{term "invariant ES"}, but must be
@{term "invariant ES \<and> (\<exists>rhs. (X, rhs) \<in> ES)"}. By assumption
@{thm (prem 2) Solve} and Lem.~\ref{invzero}, the more general invariant holds for
the initial equational system. This is premise 1 of~\eqref{whileprinciple}.
Premise 2 is given by Lem.~\ref{iterone} and the fact that @{const Iter} might
modify the @{text rhs} in the equation @{term "X = rhs"}, but does not remove it.
Premise 3 of~\eqref{whileprinciple} is by Lem.~\ref{itertwo}. Now in premise 4
we like to show that there exists a @{text rhs} such that @{term "ES = {(X, rhs)}"}
and that @{text "invariant {(X, rhs)}"} holds, provided the condition @{text "Cond"}
does not hols. By the stronger invariant we know there exists such a @{text "rhs"}
with @{term "(X, rhs) \<in> ES"}. Because @{text Cond} is not true, we know the cardinality
of @{text ES} is @{text 1}. This means @{text "ES"} must actually be the equation @{text "X = rhs"},
for which the invariant holds. This allows us to conclude that
@{term "Solve X (Init (UNIV // \<approx>A)) = {(X, rhs)}"} and @{term "invariant {(X, rhs)}"} hold.\qed
\end{proof}
\noindent
With this lemma in place we can show that for every equivalence class in @{term "UNIV // \<approx>A"}
there exists a regular expression.
\begin{lemma}\label{every_eqcl_has_reg}
@{thm[mode=IfThen] every_eqcl_has_reg}
\end{lemma}
\begin{proof}
By the preceeding Lemma, we know that there exists a @{text "rhs"} such
that @{term "Solve X (Init (UNIV // \<approx>A))"} returns the equation @{text "X = rhs"},
and that the invariant holds for this equation. That means we
know @{text "X = \<Union>\<calL> ` rhs"}. We further know that
this is equal to \mbox{@{text "\<Union>\<calL> ` (Arden X rhs)"}} using the properties of the
invariant and Lem.\ref{ardenable}. Using the validity property for the equation @{text "X = rhs"},
we can infer that @{term "rhss rhs \<subseteq> {X}"} and because the arden operation
removes that @{text X} from @{text rhs}, that @{term "rhss (Arden X rhs) = {}"}.
That means @{term "Arden X rhs"} can only consist of terms of the form @{term "Lam r"}.
So we can collect those (finitely many) regular expressions and have @{term "X = L (\<Uplus>rs)"}.
With this we can conclude the proof.\qed
\end{proof}
\noindent
Lem.~\ref{every_eqcl_has_reg} allows us to finally give a proof for the first direction
of the Myhill-Nerode theorem.
\begin{proof}[of Thm.~\ref{myhillnerodeone}]
By Lem.~\ref{every_eqcl_has_reg} we know that there exists a regular language for
every equivalence class in @{term "UNIV // \<approx>A"}. Since @{text "finals A"} is
a subset of @{term "UNIV // \<approx>A"}, we also know that for every equivalence class
in @{term "finals A"} there exists a regular language. Moreover by assumption
we know that @{term "finals A"} must be finite, and therefore there must be a finite
set of regular expressions @{text "rs"} such that
\begin{center}
@{term "\<Union>(finals A) = L (\<Uplus>rs)"}
\end{center}
\noindent
Since the left-hand side is equal to @{text A}, we can use @{term "\<Uplus>rs"}
as the regular expression that is needed in the theorem.\qed
\end{proof}
*}
section {* Myhill-Nerode, Second Part *}
text {*
TO BE DONE
\begin{theorem}
Given @{text "r"} is a regular expressions, then @{thm Myhill_Nerode2}.
\end{theorem}
% \begin{proof}
% By induction on the structure of @{text r}. The cases for @{const NULL}, @{const EMPTY}
% and @{const CHAR} are straightforward, because we can easily establish
% \begin{center}
% \begin{tabular}{l}
% @{thm quot_null_eq}\\
% @{thm quot_empty_subset}\\
% @{thm quot_char_subset}
% \end{tabular}
% \end{center}
%
% \end{proof}
% @{thm tag_str_ALT_def[where ?L1.0="A" and ?L2.0="B"]}
% @{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B"]}
% @{thm tag_str_STAR_def[where ?L1.0="A"]}
*}
section {* Conclusion and Related Work *}
text {*
In this paper we took the view that a regular language is one where there
exists a regular expression that matches all its strings. Regular
expressions can be conveniently defined as a datatype in a HOL-based theorem
prover. For us it was therefore interesting to find out how far we can push
this point of view.
Having formalised the Myhill-Nerode theorem means we
pushed quite far. Using this theorem we can obviously prove when a language
is \emph{not} regular---by establishing that it has infinitely many
equivalence classes generated by the Myhill-Nerode relation (this is usually
the purpose of the pumping lemma \cite{Kozen97}). We can also use it to
establish the standard textbook results about closure properties of regular
languages. Interesting is the case of closure under complement, because
it seems difficult to construct a regular expression for the complement
language by direct means. However the existence can be easily proved using
the Myhill-Nerode theorem since clearly
\begin{center}
@{term "s\<^isub>1 \<approx>A s\<^isub>2"} if and only if @{term "s\<^isub>1 \<approx>(-A) s\<^isub>2"}
\end{center}
\noindent
holds for any strings @{text "s\<^isub>1"} and @{text
"s\<^isub>2"}. Therefore @{text A} and @{term "-A"} give rise to the same
partitions. From the closure under complementation follows also the closure
under intersection and set difference by some simple set calculations.
Proving the same result via automata would be quite involved. It includes the
steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text
"\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"}
regular expression.
Our formalisation consists of ??? lines of Isar code for the first
direction and ??? for the second. While this might be seen as too large
to count as a concise proof pearl, this should be seen in the context
of the work done by Constable at al \cite{Constable00} who formalised
the Myhill-Nerode theorem in Nuprl using automata. They write that
their four-member team needed something on the magnitute of 18 months
to formalise the Myhill-Nerode theorem. Our estimate is that we needed
approximately 3 months for our fomalisation and this included the time
to find our proof arguments, as we could not find them in the literature.
So for us the formalisation was not the bottleneck. It is hard for us
to gauge the size of a formalisation in Nurpl, but from what is shown in
the Nuprl Math Library their development seems substantially larger.
Our proof of the first direction is very much inspired by \emph{Brzozowski's
algebraic mehod} used to convert a finite automaton to a regular
expression. The close connection can be seen by considering the equivalence
classes as the states of the minimal automaton for the regular language.
However there are some subtle differences. If we identify equivalence
classes with the states of the automaton, then the most natural choice is to
characterise each state with the set of strings starting from the initial
state leading up to that state. Usually the states are characterised as the
ones starting from that state leading to the terminal states. The first
choice has consequences how the initial equational system is set up. We have
the $\lambda$-term on our ``initial state'', while Brzozowski has it on the
terminal states. This means we also need to reverse the direction of Arden's
lemma.
We briefly considered using the method Brzozowski presented in the Appendix
of \cite{Brzozowski64} in order to prove the second direction of the
Myhill-Nerode theorem. There he calculates the derivatives for regular
expressions and shows that there can be only finitely many of them. We could
use as the tag of a string @{text s} the derivative of a regular expression
generated with respect to @{text s}. Using the fact that two strings are
Myhill-Nerode related whenever their derivative is the same together with
the fact that there are only finitely many derivatives for a regular
expression would give us the same argument. However it seems not so easy to
calculate the derivatives and then to count them. Therefore we preferred our
direct method of using tagging-functions involving equivalence classes. This
is also where our method shines, because we can completely side-step the
standard argument \cite{Kozen97} where automata need to be composed, which
is not so convenient to formalise in a HOL-based theorem prover.
While regular expressions are convenient in formalisations, they have some
limitations. One is that there seems to be no notion of a minimal regular
expression, like there is a notion of a minimal automaton for a regular
expression.
*}
(*<*)
end
(*>*)