regexp: comparison Journal/Paper.thy

equal deleted inserted replaced

-:dae7501b26ac
+:8ce9a432680b
 \noindent
 Regular languages are an important and well-understood subject in Computer
 Science, with many beautiful theorems and many useful algorithms. There is a
 wide range of textbooks on this subject, many of which are aimed at students
 and contain very detailed `pencil-and-paper' proofs
-(e.g.~\cite{HopcroftUllman69,Kozen97}). It seems natural to exercise theorem provers by
+(e.g.~the textbooks by \citeN{HopcroftUllman69} and by \citeN{Kozen97}).
+It seems natural to exercise theorem provers by
 formalising the theorems and by verifying formally the algorithms.
 A popular choice for a theorem prover would be one based on Higher-Order
 Logic (HOL), for example HOL4, HOLlight or Isabelle/HOL. For the development
 presented in this paper we will use the Isabelle/HOL. HOL is a predicate calculus
 that allows quantification over predicate variables. Its type system is
-based on Church's Simple Theory of Types \cite{Church40}.  Although many
+based on the Simple Theory of Types by \citeN{Church40}.  Although many
 mathematical concepts can be conveniently expressed in HOL, there are some
-limitations that hurt badly when attempting a simple-minded formalisation
+limitations that hurt when attempting a simple-minded formalisation
 of regular languages in it.
-The typical approach (for example \cite{HopcroftUllman69,Kozen97}) to
+The typical approach to
-regular languages is to introduce finite deterministic automata and then
+regular languages, taken for example by \citeN{HopcroftUllman69}
-define everything in terms of them.  For example, a regular language is
+and by \citeN{Kozen97},  is to introduce finite deterministic automata and then
+define most notions in terms of them.  For example, a regular language is
 normally defined as:
 \begin{dfntn}\label{baddef}
 A language @{text A} is \emph{regular}, provided there is a
 finite deterministic automaton that recognises all strings of @{text "A"}.
 convince oneself that regular languages are closed under complementation:
 one just has to exchange the accepting and non-accepting states in the
 corresponding automaton to obtain an automaton for the complement language.
 The problem, however, lies with formalising such reasoning in a
 theorem prover. Automata are built up from states and transitions that are
-usually represented as graphs, matrices or functions, none of which can be
+commonly represented as graphs, matrices or functions, none of which, unfortunately,
-defined as an inductive datatype.
+can be defined as an inductive datatype.
 In case of graphs and matrices, this means we have to build our own
 reasoning infrastructure for them, as neither Isabelle/HOL nor HOL4 nor
-HOLlight support them with libraries. Even worse, reasoning about graphs and
+HOLlight support them with libraries. Also, reasoning about graphs and
-matrices can be a real hassle in theorem provers, because
+matrices can be a hassle in theorem provers, because
 we have to be able to combine automata.  Consider for
 example the operation of sequencing two automata, say $A_1$ and $A_2$, by
 connecting the accepting states of $A_1$ to the initial state of $A_2$:
 \begin{center}
 an automaton that recognises all its strings (Definition~\ref{baddef}). This
 is because we cannot make a definition in HOL that is only polymorphic in
 the state type, but not in the predicate for regularity; and there is no
 type quantification available in HOL (unlike in Coq, for
 example).\footnote{Slind already pointed out this problem in an email to the
-HOL4 mailing list on 21st April 2005.}$^,$\footnote{While in Coq one can avoid
+HOL4 mailing list on 21st April 2005.}
-this particular problem, all other difficulties we point out below still apply.}
+%$^,$\footnote{While in Coq one can avoid
+%this particular problem, all other difficulties we point out below still apply.}
 An alternative, which provides us with a single type for states of automata,
 is to give every state node an identity, for example a natural number, and
 then be careful to rename these identities apart whenever connecting two
 automata. This results in clunky proofs establishing that properties are
 reason about.
 Functions are much better supported in Isabelle/HOL, but they still lead to
 similar problems as with graphs.  Composing, for example, two
 non-deterministic automata in parallel requires also the formalisation of
-disjoint unions. Nipkow \cite{Nipkow98} dismisses for this the option of
+disjoint unions. \citeN{Nipkow98} dismisses for this the option of
 using identities, because it leads according to him to ``messy
 proofs''. Since he does not need to define what regular languages are,
 Nipkow opts for a variant of \eqref{disjointunion} using bit lists, but
 writes\smallskip
 `` & If the reader finds the above treatment in terms of bit lists revoltingly
 concrete, I cannot disagree. A more abstract approach is clearly desirable.''
 \end{tabular}
 \end{quote}\smallskip
+%\noindent
-\noindent
+%Moreover, it is not so clear how to conveniently impose a finiteness
-Moreover, it is not so clear how to conveniently impose a finiteness
+%condition upon functions in order to represent \emph{finite} automata. The
-condition upon functions in order to represent \emph{finite} automata. The
+%best is probably to resort to more advanced reasoning frameworks, such as
-best is probably to resort to more advanced reasoning frameworks, such as
+%\emph{locales} or \emph{type classes}, which are \emph{not} available in all
-\emph{locales} or \emph{type classes}, which are \emph{not} available in all
+%HOL-based theorem provers.
-HOL-based theorem provers.
 Because of these problems to do with representing automata, there seems to
 be no substantial formalisation of automata theory and regular languages
-carried out in HOL-based theorem provers. Nipkow \cite{Nipkow98} establishes
+carried out in HOL-based theorem provers. \citeN{Nipkow98} establishes
 the link between regular expressions and automata in the context of
-lexing. Berghofer and Reiter \cite{BerghoferReiter09} formalise automata
+lexing. \citeN{BerghoferReiter09} formalise automata
 working over bit strings in the context of Presburger arithmetic.  The only
-larger formalisations of automata theory are carried out in Nuprl
+larger formalisations of automata theory are carried out in Nuprl by
-\cite{Constable00} and in Coq, e.g.~\cite{Almeidaetal10,Filliatre97}.
+\citeN{Constable00} and in Coq by \citeN{Filliatre97}, \citeN{Almeidaetal10}
+and \citeN{Braibant12}.
-Also, one might consider automata as just convenient `vehicles' for
-establishing properties about regular languages.  However, paper proofs
+%Also, one might consider automata as just convenient `vehicles' for
-about automata often involve subtle side-conditions which are easily
+%establishing properties about regular languages.  However, paper proofs
-overlooked, but which make formal reasoning rather painful. For example
+%about automata often involve subtle side-conditions which are easily
-Kozen's proof of the Myhill-Nerode Theorem requires that automata do not
+%overlooked, but which make formal reasoning rather painful. For example
-have inaccessible states \cite{Kozen97}. Another subtle side-condition is
+%Kozen's proof of the Myhill-Nerode Theorem requires that automata do not
-completeness of automata, that is automata need to have total transition
+%have inaccessible states \cite{Kozen97}. Another subtle side-condition is
-functions and at most one `sink' state from which there is no connection to
+%completeness of automata, that is automata need to have total transition
-a final state (Brzozowski mentions this side-condition in the context of
+%functions and at most one `sink' state from which there is no connection to
-state complexity of automata \cite{Brzozowski10}, but it is also needed
+%a final state (Brzozowski mentions this side-condition in the context of
-in the usual construction of the complement automaton). Such side-conditions mean
+%state complexity of automata \cite{Brzozowski10}, but it is also needed
-that if we define a regular language as one for which there exists \emph{a}
+%in the usual construction of the complement automaton). Such side-conditions mean
-finite automaton that recognises all its strings (see
+%that if we define a regular language as one for which there exists \emph{a}
-Definition~\ref{baddef}), then we need a lemma which ensures that another
+%finite automaton that recognises all its strings (see
-equivalent one can be found satisfying the side-condition, and also need to
+%Definition~\ref{baddef}), then we need a lemma which ensures that another
-make sure our operations on automata preserve them. Unfortunately, such
+%equivalent one can be found satisfying the side-condition, and also need to
-`little' and `obvious' lemmas make formalisations of automata theory
+%make sure our operations on automata preserve them. Unfortunately, such
-hair-pulling experiences.
+%`little' and `obvious' lemmas make formalisations of automata theory
+%hair-pulling experiences.
 In this paper, we will not attempt to formalise automata theory in
 Isabelle/HOL nor will we attempt to formalise automata proofs from the
 literature, but take a different approach to regular languages than is
 usually taken. Instead of defining a regular language as one where there
 that matches all strings of @{text "A"}.
 \end{dfntn}
 \noindent
 And then `forget' automata completely.
-The reason is that regular expressions, unlike graphs, matrices and
+The reason is that regular expressions %, unlike graphs, matrices and
-functions, can be easily defined as an inductive datatype. A reasoning
+%functions,
-infrastructure (like induction and recursion) comes for free in
+can be defined as an inductive datatype and a reasoning
-HOL. Moreover, no side-conditions will be needed for regular expressions,
+infrastructure for them (like induction and recursion) comes for free in
-like we need for automata. This convenience of regular expressions has
+HOL. %Moreover, no side-conditions will be needed for regular expressions,
+%like we need for automata.
+This convenience of regular expressions has
 recently been exploited in HOL4 with a formalisation of regular expression
-matching based on derivatives \cite{OwensSlind08} and with an equivalence
+matching based on derivatives by \citeN{OwensSlind08}, and with an equivalence
-checker for regular expressions in Isabelle/HOL \cite{KraussNipkow11}.  The
+checker for regular expressions in Isabelle/HOL by \citeN{KraussNipkow11}.  The
 main purpose of this paper is to show that a central result about regular
 languages---the Myhill-Nerode Theorem---can be recreated by only using
 regular expressions. This theorem gives necessary and sufficient conditions
 for when a language is regular. As a corollary of this theorem we can easily
 establish the usual closure properties, including complementation, for
-regular languages. We use the Continuation Lemma \cite{Rosenberg06},
+regular languages. We use the Continuation Lemma, which is also a corollary
-which is also a corollary of the Myhill-Nerode Theorem, for establishing
+of the Myhill-Nerode Theorem, for establishing
 the non-regularity of the language @{text "a\<^isup>nb\<^isup>n"}.\medskip
 \noindent
 {\bf Contributions:} There is an extensive literature on regular languages.
 To our best knowledge, our proof of the Myhill-Nerode Theorem is the first
 that is based on regular expressions, only. The part of this theorem stating
 that finitely many partitions imply regularity of the language is proved by
 an argument about solving equational systems.  This argument appears to be
 folklore. For the other part, we give two proofs: one direct proof using
 certain tagging-functions, and another indirect proof using Antimirov's
-partial derivatives \cite{Antimirov95}. Again to our best knowledge, the
+partial derivatives \citeyear{Antimirov95}. Again to our best knowledge, the
 tagging-functions have not been used before for establishing the Myhill-Nerode
 Theorem. Derivatives of regular expressions have been used recently quite
 widely in the literature; partial derivatives, in contrast, attract much
 less attention. However, partial derivatives are more suitable in the
 context of the Myhill-Nerode Theorem, since it is easier to establish
 \end{tabular}
 \end{center}
 \noindent
 where @{text "@"} is the list-append operation. The Kleene-star of a language @{text A}
-is defined as the union over all powers, namely @{thm star_def}. In the paper
+is defined as the union over all powers, namely @{thm star_def[where A="A", THEN eq_reflection]}.
+In the paper
 we will make use of the following properties of these constructions.
 \begin{prpstn}\label{langprops}\mbox{}\\
 \begin{tabular}{@ {}lp{10cm}}
 (i)   & @{thm star_unfold_left}     \\
 the strings in @{term "A \<up> (Suc n)"} must be longer than @{text n}.
 Property @{text "(iv)"} states that a non-empty string in @{term "A\<star>"} can
 always be split up into a non-empty prefix belonging to @{text "A"} and the
 rest being in @{term "A\<star>"}. We omit
 the proofs for these properties, but invite the reader to consult our
-formalisation.\footnote{Available in the Archive of Formal Proofs at
+formalisation.\footnote{Available under \citeN{myhillnerodeafp11} in the Archive of Formal Proofs at\\
-\url{http://afp.sourceforge.net/entries/Myhill-Nerode.shtml}
+\url{http://afp.sourceforge.net/entries/Myhill-Nerode.shtml}.}
-\cite{myhillnerodeafp11}.}
 The notation in Isabelle/HOL for the quotient of a language @{text A}
 according to an equivalence relation @{term REL} is @{term "A // REL"}. We
 will write @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined as
 \mbox{@{text "{y | y \<approx> x}"}}, and have @{text "x \<approx> y"} if and only if @{text
 involving equivalence classes of languages. For this we will use Arden's Lemma
 (see for example \cite[Page 100]{Sakarovitch09}),
 which solves equations of the form @{term "X = A \<cdot> X \<union> B"} provided
 @{term "[] \<notin> A"}. However we will need the following `reversed'
 version of Arden's Lemma (`reversed' in the sense of changing the order of @{term "A \<cdot> X"} to
-\mbox{@{term "X \<cdot> A"}}).\footnote{The details of its proof are given in the Appendix.}
+\mbox{@{term "X \<cdot> A"}}).\footnote{The details of the proof for the reversed Arden's Lemma
+are given in the Appendix.}
 \begin{lmm}[(Reversed Arden's Lemma)]\label{arden}\mbox{}\\
 If @{thm (prem 1) reversed_Arden} then
 @{thm (lhs) reversed_Arden} if and only if
 @{thm (rhs) reversed_Arden}.
 & @{text"|"} & @{term "Star r"}
 \end{tabular}
 \end{center}
 \noindent
-and the language matched by a regular expression is defined as
+and the language matched by a regular expression is defined by recursion as
 \begin{center}
 \begin{tabular}{r@ {\hspace{2mm}}c@ {\hspace{2mm}}l}
 @{thm (lhs) lang.simps(1)} & @{text "\<equiv>"} & @{thm (rhs) lang.simps(1)}\\
 @{thm (lhs) lang.simps(2)} & @{text "\<equiv>"} & @{thm (rhs) lang.simps(2)}\\
 \end{tabular}
 \end{center}
 Given a finite set of regular expressions @{text rs}, we will make use of the operation of generating
 a regular expression that matches the union of all languages of @{text rs}.
-This definion is not trivial in a theorem prover, but since
+This definion is not trivial in a theorem prover, because @{text rs} (being a set) is unordered,
+but the regular expression needs an order. Since
 we only need to know the
 existence
 of such a regular expression, we can use Isabelle/HOL's @{const "fold_graph"} and Hilbert's
-@{text "\<epsilon>"} to define @{term "\<Uplus>rs"}. This operation, roughly speaking, folds @{const PLUS} over the
+choice operator, written @{text "SOME"} in Isabelle/HOL, for defining @{term "\<Uplus>rs"}.
+This operation, roughly speaking, folds @{const PLUS} over the
 set @{text rs} with @{const ZERO} for the empty set. We can prove that for a finite set @{text rs}
 %
 \begin{equation}\label{uplus}
 \mbox{@{thm (lhs) folds_plus_simp} @{text "= \<Union> (\<calL> ` rs)"}}
 \end{equation}
 \noindent
 In our running example, @{text "X\<^isub>2"} is the only
 equivalence class in @{term "finals {[c]}"}.
 It is straightforward to show that in general
+%
 \begin{equation}\label{finalprops}
 @{thm lang_is_union_of_finals}\hspace{15mm}
 @{thm finals_in_partitions}
 \end{equation}
 equivalence classes and only finitely many characters.
 The term @{text "\<lambda>(ONE)"} in the first equation acts as a marker for the initial state, that
 is the equivalence class
 containing the empty string @{text "[]"}.\footnote{Note that we mark, roughly speaking, the
 single `initial' state in the equational system, which is different from
-the method by Brzozowski \cite{Brzozowski64}, where he marks the
+the method by \citeN{Brzozowski64}, where he marks the
 `terminal' states. We are forced to set up the equational system in our
 way, because the Myhill-Nerode Relation determines the `direction' of the
 transitions---the successor `state' of an equivalence class @{text Y} can
 be reached by adding a character to the end of @{text Y}. This is also the
 reason why we have to use our reversed version of Arden's Lemma.}
 In our running example we have the initial equational system
+%
 \begin{equation}\label{exmpcs}
 \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
 @{term "X\<^isub>1"} & @{text "="} & @{text "\<lambda>(ONE)"}\\
 @{term "X\<^isub>2"} & @{text "="} & @{text "(X\<^isub>1, ATOM c)"}\\
 @{term "X\<^isub>3"} & @{text "="} & @{text "(X\<^isub>1, ATOM d\<^isub>1) + \<dots> + (X\<^isub>1, ATOM d\<^isub>n)"}\\
 & & @{text "else"}~@{term "{Trn Y (ATOM c)| Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X}"}\\
 @{thm (lhs) Init_def}     & @{text "\<equiv>"} & @{thm (rhs) Init_def}
 \end{tabular}}
 \end{equation}
 \noindent
 Because we use sets of terms
 for representing the right-hand sides of equations, we can
 prove \eqref{inv1} and \eqref{inv2} more concisely as
 %
 regular expression to @{text "xrhs"} and union it up with @{text rhs'}. When we use
 the substitution operation we will arrange it so that @{text "xrhs"} does not contain
 any occurrence of @{text X}. For example substituting the first equation in
 \eqref{exmpcs} into the right-hand side of the second, thus eliminating the equivalence
 class @{text "X\<^isub>1"}, gives us the equation
+%
 \begin{equation}\label{exmpresult}
 \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
-@{term "X\<^isub>2"} & @{text "="} & @{text "\<lambda>(TIMES ONE (ATOM c))"}\\
+@{term "X\<^isub>2"} & @{text "="} & @{text "\<lambda>(TIMES ONE (ATOM c))"}
 \end{tabular}}
 \end{equation}
 With these two operations in place, we can define the operation that removes one equation
 from an equational systems @{text ES}. The operation @{const Subst_all}
 @{thm Solve_def}
 \end{center}
 \noindent
 We are not concerned here with the definition of this operator (see
-Berghofer and Nipkow \cite{BerghoferNipkow00} for example), but note that we
+\cite{BerghoferNipkow00} for example), but note that we
 eliminate in each @{const Iter}-step a single equation, and therefore have a
 well-founded termination order by taking the cardinality of the equational
 system @{text ES}. This enables us to prove properties about our definition
 of @{const Solve} when we `call' it with the equivalence class @{text X} and
 the initial equational system @{term "Init (UNIV // \<approx>A)"} from
 \eqref{initcs} using the principle:
+%
 \begin{equation}\label{whileprinciple}
 \mbox{\begin{tabular}{l}
 @{term "invariant (Init (UNIV // \<approx>A))"} \\
 @{term "\<forall>ES. invariant ES \<and> Cond ES \<longrightarrow> invariant (Iter X ES)"}\\
 @{term "\<forall>ES. invariant ES \<and> Cond ES \<longrightarrow> card (Iter X ES) < card ES"}\\
 \end{proof}
 \noindent
 Next we show that @{text Iter} preserves the invariant.
-\begin{lmm}\label{iterone}
+\begin{lmm}\label{iterone} If
-@{thm[mode=IfThen] iteration_step_invariant[where xrhs="rhs"]}
+@{thm (prem 1) iteration_step_invariant[where xrhs="rhs"]},
+@{thm (prem 2) iteration_step_invariant[where xrhs="rhs"]} and
+@{thm (prem 3) iteration_step_invariant[where xrhs="rhs"]}, then
+@{thm (concl) iteration_step_invariant[where xrhs="rhs"]}.
 \end{lmm}
 \begin{proof}
 The argument boils down to choosing an equation @{text "Y = yrhs"} to be eliminated
 and to show that @{term "Subst_all (ES - {(Y, yrhs)}) Y (Arden Y yrhs)"}
 preserves the invariant.
 We prove this as follows:
 \begin{center}
 \begin{tabular}{@ {}l@ {}}
-@{text "\<forall> ES."}\\ \mbox{}\hspace{5mm}@{thm (prem 1) Subst_all_satisfies_invariant} implies
+@{text "\<forall>ES."}~@{thm (prem 1) Subst_all_satisfies_invariant} implies
 @{thm (concl) Subst_all_satisfies_invariant}
 \end{tabular}
 \end{center}
 \noindent
 \end{proof}
 \noindent
 We also need the fact that @{text Iter} decreases the termination measure.
-\begin{lmm}\label{itertwo}\mbox{}\\
+\begin{lmm}\label{itertwo} If
-@{thm[mode=IfThen] iteration_step_measure[simplified (no_asm), where xrhs="rhs"]}
+@{thm (prem 1) iteration_step_measure[simplified (no_asm), where xrhs="rhs"]},
+@{thm (prem 2) iteration_step_measure[simplified (no_asm), where xrhs="rhs"]} and
+@{thm (prem 3) iteration_step_measure[simplified (no_asm), where xrhs="rhs"]}, then\\
+\mbox{@{thm (concl) iteration_step_measure[simplified (no_asm), where xrhs="rhs"]}}.
 \end{lmm}
 \begin{proof}
 By assumption we know that @{text "ES"} is finite and has more than one element.
 Therefore there must be an element @{term "(Y, yrhs) \<in> ES"} with
 expressions corresponding to equivalence classes not in @{term "finals A"},
 then we obtain a regular expression for the complement language @{term "- A"}.
 This is similar to the usual construction of a `complement automaton'.
 *}
 section {* Myhill-Nerode, Second Part *}
 text {*
 \noindent
 In this section we will give a proof for establishing the second
 \noindent
 Much more interesting, however, are the inductive cases. They seem hard to be solved
 directly. The reader is invited to try.
 In order to see how our proof proceeds consider the following suggestive picture
-taken from Constable et al \cite{Constable00}:
+given by \citeN{Constable00}:
+%
 \begin{equation}\label{pics}
 \mbox{\begin{tabular}{c@ {\hspace{10mm}}c@ {\hspace{10mm}}c}
-\begin{tikzpicture}[scale=1]
+\begin{tikzpicture}[scale=0.95]
 %Circle
 \draw[thick] (0,0) circle (1.1);
 \end{tikzpicture}
 &
-\begin{tikzpicture}[scale=1]
+\begin{tikzpicture}[scale=0.95]
 %Circle
 \draw[thick] (0,0) circle (1.1);
 %Main rays
 \foreach \a in {0, 90,...,359}
 \draw[very thick] (0, 0) -- (\a:1.1);
 \foreach \a / \l in {45/1, 135/2, 225/3, 315/4}
 \draw (\a: 0.65) node {$a_\l$};
 \end{tikzpicture}
 &
-\begin{tikzpicture}[scale=1]
+\begin{tikzpicture}[scale=0.95]
 %Circle
 \draw[thick] (0,0) circle (1.1);
 %Main rays
 \foreach \a in {0, 45,...,359}
 \draw[very thick] (0, 0) -- (\a:1.1);
 \noindent
 We therefore can analyse how the strings @{text "x @ z"} are in the language
 @{text A} and then construct an appropriate tagging-function to infer that
 @{term "y @ z"} are also in @{text A}.  For this we will use the notion of
 the set of all possible \emph{partitions} of a string:
+%
 \begin{equation}
 @{thm Partitions_def}
 \end{equation}
 \noindent
 the second direction of the Myhill-Nerode Theorem, it is sufficient to find
 a more refined relation than @{term "\<approx>(lang r)"} for which we can
 show that there are only finitely many equivalence classes. So far we
 showed this directly by induction on @{text "r"} using tagging-functions.
 However, there is also  an indirect method to come up with such a refined
-relation by using derivatives of regular expressions~\cite{Brzozowski64}.
+relation by using derivatives of regular expressions introduced by \citeN{Brzozowski64}.
 Assume the following two definitions for the \emph{left-quotient} of a language,
 which we write as @{term "Deriv c A"} and @{term "Derivs s A"} where @{text c}
 is a character and @{text s} a string, respectively:
 \noindent
 where we apply the left-quotient to a set of languages and then combine the results.
 Clearly we have the following equivalence between the Myhill-Nerode Relation
 (Definition~\ref{myhillneroderel}) and left-quotients
+%
 \begin{equation}\label{mhders}
 @{term "x \<approx>A y"} \hspace{4mm}\text{if and only if}\hspace{4mm} @{term "Derivs x A = Derivs y A"}
 \end{equation}
 \noindent
 It is also straightforward to establish the following properties of left-quotients
+%
 \begin{equation}
 \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{2mm}}l}
 @{thm (lhs) Deriv_simps(1)} & $=$ & @{thm (rhs) Deriv_simps(1)}\\
 @{thm (lhs) Deriv_simps(2)} & $=$ & @{thm (rhs) Deriv_simps(2)}\\
 @{thm (lhs) Deriv_simps(3)} & $=$ & @{thm (rhs) Deriv_simps(3)}\\
 where we use Property~\ref{langprops}@{text "(i)"} in order to infer that
 @{term "Deriv c (A\<star>) = Deriv c (A \<cdot> A\<star>)"}. We can then complete the proof by
 using the fifth equation and noting that @{term "Deriv c (A\<star>) \<subseteq> (Deriv
 c A) \<cdot> A\<star>"} provided @{text "[] \<in> A"}.
-Brzozowski observed that the left-quotients for languages of
+\citeN{Brzozowski64} observed that the left-quotients for languages of
 regular expressions can be calculated directly using the notion of
-\emph{derivatives of a regular expression} \cite{Brzozowski64}. We define
+\emph{derivatives of a regular expression}. We define
 this notion in Isabelle/HOL as follows:
+%
 \begin{center}
 \begin{longtable}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
 @{thm (lhs) deriv.simps(1)}  & @{text "\<equiv>"} & @{thm (rhs) deriv.simps(1)}\\
 @{thm (lhs) deriv.simps(2)}  & @{text "\<equiv>"} & @{thm (rhs) deriv.simps(2)}\\
 @{thm (lhs) deriv.simps(3)[where c'="d"]}  & @{text "\<equiv>"} & @{thm (rhs) deriv.simps(3)[where c'="d"]}\\
 \noindent
 The last two clauses extend derivatives from characters to strings. The
 boolean function @{term "nullable r"} needed in the @{const Times}-case tests
 whether a regular expression can recognise the empty string. It can be defined as
 follows.
+%
 \begin{center}
-\begin{tabular}{c}
+\begin{longtable}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
-\begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
 @{thm (lhs) nullable.simps(1)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(1)}\\
 @{thm (lhs) nullable.simps(2)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(2)}\\
 @{thm (lhs) nullable.simps(3)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(3)}\\
 @{thm (lhs) nullable.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}
 & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
 @{thm (lhs) nullable.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}
 & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
-@{thm (lhs) nullable.simps(6)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(6)}\\
+@{thm (lhs) nullable.simps(6)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(6)}
-\end{tabular}
+\end{longtable}
-\end{tabular}
 \end{center}
 \noindent
 By induction on the regular expression @{text r}, respectively on the string @{text s},
 one can easily show that left-quotients and derivatives of regular expressions
 relate as follows (see for example~\cite{Sakarovitch09}):
+%
 \begin{equation}\label{Dersders}
 \mbox{\begin{tabular}{c}
 @{thm Deriv_deriv}\\
 @{thm Derivs_derivs}
 \end{tabular}}
 @{term "lang (derivs x r) = lang (derivs y r)"}.
 \end{center}
 \noindent
 holds and hence
+%
 \begin{equation}
 @{term "x \<approx>(lang r) y"}\hspace{4mm}\mbox{provided}\hspace{4mm}@{term "derivs x r = derivs y r"}
 \end{equation}
 w.r.t.~the language @{text "(ab)\<^isup>\<star> \<union> (ab)\<^isup>\<star>a"}, which is formally
 written in our notation as \mbox{@{text "{[a,b]}\<^isup>\<star> \<union> ({[a,b]}\<^isup>\<star> \<cdot> {[a]})"}}
 (see \cite[Page~141]{Sakarovitch09}).
-What Brzozowski \cite{Brzozowski64} established is that for every language there
+What \citeN{Brzozowski64} established is that for every language there
 \emph{are} only finitely `dissimilar' derivatives for a regular
 expression. Two regular expressions are said to be \emph{similar} provided
 they can be identified using the using the @{text "ACI"}-identities:
+%
 \begin{equation}\label{ACI}
 \mbox{\begin{tabular}{cl}
 (@{text A}) & @{term "Plus (Plus r\<^isub>1 r\<^isub>2) r\<^isub>3"} $\equiv$ @{term "Plus r\<^isub>1 (Plus r\<^isub>2 r\<^isub>3)"}\\
 (@{text C}) & @{term "Plus r\<^isub>1 r\<^isub>2"} $\equiv$ @{term "Plus r\<^isub>2 r\<^isub>1"}\\
 (@{text I}) & @{term "Plus r r"} $\equiv$ @{term "r"}\\
 but it is very painful in a theorem prover (since there is no
 direct characterisation of the set of dissimilar derivatives).
 Fortunately, there is a much simpler approach using \emph{partial
-derivatives}. They were introduced by Antimirov \cite{Antimirov95} and can be defined
+derivatives}. They were introduced by \citeN{Antimirov95} and can be defined
 in Isabelle/HOL as follows:
+%
 \begin{center}
-\begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
+\begin{longtable}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
 @{thm (lhs) pderiv.simps(1)}  & @{text "\<equiv>"} & @{thm (rhs) pderiv.simps(1)}\\
 @{thm (lhs) pderiv.simps(2)}  & @{text "\<equiv>"} & @{thm (rhs) pderiv.simps(2)}\\
 @{thm (lhs) pderiv.simps(3)[where c'="d"]}  & @{text "\<equiv>"} & @{thm (rhs) pderiv.simps(3)[where c'="d"]}\\
 @{thm (lhs) pderiv.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}
 & @{text "\<equiv>"} & @{thm (rhs) pderiv.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
 @{term "(Timess (pderiv c r\<^isub>1) r\<^isub>2) \<union> (pderiv c r\<^isub>2)"}\\
 & & \phantom{@{text "if"}~@{term "nullable r\<^isub>1"}~}@{text "else"}~%
 @{term "Timess (pderiv c r\<^isub>1) r\<^isub>2"}\\
 @{thm (lhs) pderiv.simps(6)}  & @{text "\<equiv>"} & @{thm (rhs) pderiv.simps(6)}\smallskip\\
 @{thm (lhs) pderivs.simps(1)}  & @{text "\<equiv>"} & @{thm (rhs) pderivs.simps(1)}\\
-@{thm (lhs) pderivs.simps(2)}  & @{text "\<equiv>"} & @{text "\<Union> (pders s) ` (pder c r)"}\\
+@{thm (lhs) pderivs.simps(2)}  & @{text "\<equiv>"} & @{text "\<Union> (pders s) ` (pder c r)"}
-\end{tabular}
+\end{longtable}
 \end{center}
 \noindent
 Again the last two clauses extend partial derivatives from characters to strings.
 Unlike `simple' derivatives, the functions for partial derivatives return sets of regular
 \end{center}
 Partial derivatives can be seen as having the @{text "ACI"}-identities already built in:
 taking the partial derivatives of the
 regular expressions in \eqref{ACI} gives us in each case
-equal sets.  Antimirov \cite{Antimirov95} showed a similar result to
+equal sets.  \citeN{Antimirov95} showed a similar result to
 \eqref{Dersders} for partial derivatives, namely
+%
 \begin{equation}\label{Derspders}
 \mbox{\begin{tabular}{lc}
 @{text "(i)"}  & @{thm Deriv_pderiv}\\
 @{text "(ii)"} & @{thm Derivs_pderivs}
 \end{tabular}}
 \end{proof}
 \noindent
 Taking \eqref{Dersders} and \eqref{Derspders} together gives the relationship
 between languages of derivatives and partial derivatives
+%
 \begin{equation}
 \mbox{\begin{tabular}{lc}
 @{text "(i)"}  & @{thm deriv_pderiv[symmetric]}\\
 @{text "(ii)"} & @{thm derivs_pderivs[symmetric]}
 \end{tabular}}
 of derivatives already built in.
 Antimirov also proved that for every language and every regular expression
 there are only finitely many partial derivatives, whereby the set of partial
 derivatives of @{text r} w.r.t.~a language @{text A} is defined as
+%
 \begin{equation}\label{Pdersdef}
 @{thm pderivs_lang_def}
 \end{equation}
-\begin{thrm}[(Antimirov \cite{Antimirov95})]\label{antimirov}
+\begin{thrm}[\cite{Antimirov95}]\label{antimirov}
 For every language @{text A} and every regular expression @{text r},
 \mbox{@{thm finite_pderivs_lang}}.
 \end{thrm}
 \noindent
 Antimirov's proof first establishes this theorem for the language @{term UNIV1},
 which is the set of all non-empty strings. For this he proves:
+%
 \begin{equation}\label{pdersunivone}
 \mbox{\begin{tabular}{l}
 @{thm pderivs_lang_Zero}\\
 @{thm pderivs_lang_One}\\
 @{thm pderivs_lang_Atom}\\
 steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text
 "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"}
 regular expression. Clearly not something you want to formalise in a theorem
 prover if it is cumbersome to reason about automata.
-Once closure under complement is established, closure under intersection
+%Once closure under complement is established, closure under intersection
-and set difference is also easy, because
+%and set difference is also easy, because
+%
-\begin{center}
+%\begin{center}
-\begin{tabular}{c}
+%\begin{tabular}{c}
-@{term "A \<inter> B = - (- A \<union> - B)"}\\
+%@{term "A \<inter> B = - (- A \<union> - B)"}\\
-@{term "A - B = - (- A \<union> B)"}
+%@{term "A - B = - (- A \<union> B)"}
-\end{tabular}
+%\end{tabular}
-\end{center}
+%\end{center}
+%
-\noindent
+%\noindent
-Since all finite languages are regular, then by closure under complement also
+%Since all finite languages are regular, then by closure under complement also
-all co-finite languages. Closure of regular languages under reversal, that is
+%all co-finite languages.
+%
-\begin{center}
+%Closure of regular languages under reversal, that is
-@{text "A\<^bsup>-1\<^esup> \<equiv> {s\<^bsup>-1\<^esup> | s \<in> A}"}
+%
-\end{center}
+%\begin{center}
+%@{text "A\<^bsup>-1\<^esup> \<equiv> {s\<^bsup>-1\<^esup> | s \<in> A}"}
-\noindent
+%\end{center}
-can be shown with the help of the following operation defined recursively over
+%
-regular expressions
+%\noindent
+%can be shown with the help of the following operation defined recursively over
-\begin{center}
+%regular expressions
-\begin{tabular}{r@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
+%
-@{thm (lhs) Rev.simps(1)} & @{text "\<equiv>"} & @{thm (rhs) Rev.simps(1)}\\
+%\begin{center}
-@{thm (lhs) Rev.simps(2)} & @{text "\<equiv>"} & @{thm (rhs) Rev.simps(2)}\\
+%\begin{tabular}{r@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
-@{thm (lhs) Rev.simps(3)} & @{text "\<equiv>"} & @{thm (rhs) Rev.simps(3)}\\
+%@{thm (lhs) Rev.simps(1)} & @{text "\<equiv>"} & @{thm (rhs) Rev.simps(1)}\\
-@{thm (lhs) Rev.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]} & @{text "\<equiv>"} &
+%@{thm (lhs) Rev.simps(2)} & @{text "\<equiv>"} & @{thm (rhs) Rev.simps(2)}\\
-@{thm (rhs) Rev.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
+%@{thm (lhs) Rev.simps(3)} & @{text "\<equiv>"} & @{thm (rhs) Rev.simps(3)}\\
-@{thm (lhs) Rev.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]} & @{text "\<equiv>"} &
+%@{thm (lhs) Rev.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]} & @{text "\<equiv>"} &
-@{thm (rhs) Rev.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
+%    @{thm (rhs) Rev.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
-@{thm (lhs) Rev.simps(6)} & @{text "\<equiv>"} & @{thm (rhs) Rev.simps(6)}\\
+%@{thm (lhs) Rev.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]} & @{text "\<equiv>"} &
-\end{tabular}
+%    @{thm (rhs) Rev.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
-\end{center}
+%@{thm (lhs) Rev.simps(6)} & @{text "\<equiv>"} & @{thm (rhs) Rev.simps(6)}\\
+%\end{tabular}
-\noindent
+%\end{center}
-For this operation we can show
+%
+%\noindent
-\begin{center}
+%For this operation we can show
-@{text "(\<calL>(r))\<^bsup>-1\<^esup>"}~@{text "="}~@{thm (rhs) rev_lang}
+%
-\end{center}
+%\begin{center}
+%@{text "(\<calL>(r))\<^bsup>-1\<^esup>"}~@{text "="}~@{thm (rhs) rev_lang}
-\noindent
+%\end{center}
-from which closure under reversal of regular languages follows.
+%
+%\noindent
+%from which closure under reversal of regular languages follows.
 A perhaps surprising fact is that regular languages are closed under any
 left-quotient. Define
 \begin{center}
 using partial derivatives: From @{text A} being regular we know there exists
 a regular expression @{text r} such that @{term "A = lang r"}. We also know
 that @{term "pderivs_lang B r"} is finite for every language @{text B} and
 regular expression @{text r} (recall Theorem~\ref{antimirov}). By definition
 and \eqref{Derspders} we have
+%
 \begin{equation}\label{eqq}
 @{term "Deriv_lang B (lang r) = (\<Union> lang ` (pderivs_lang B r))"}
 \end{equation}
 \noindent
 B r"}, we know by \eqref{uplus} that there exists a regular expression so that
 the right-hand side of \eqref{eqq} is equal to the language \mbox{@{term "lang (\<Uplus>(pderivs_lang B
 r))"}}. Thus the regular expression @{term "\<Uplus>(pderivs_lang B r)"} verifies that
 @{term "Deriv_lang B A"} is regular.
-Even more surprising is the fact that for \emph{every} language @{text A}, the language
+Even more surprising is the fact given first by \citeN{Haines69} stating
-consisting of all (scattered) substrings of @{text A} is regular \cite{Haines69} (see also
+that for \emph{every} language @{text A}, the language
+consisting of all (scattered) substrings of @{text A} is regular  (see also
 \cite{Shallit08,Gasarch09}).
 A \emph{(scattered) substring} can be obtained
 by striking out zero or more characters from a string. This can be defined
 inductively in Isabelle/HOL by the following three rules:
-\begin{center}
+%\begin{center}
-@{thm[mode=Axiom] emb0[where bs="x"]}\hspace{10mm}
+%@ {thm[mode=Axiom] emb0[where bs="x"]}\hspace{10mm}
-@{thm[mode=Rule] emb1[where as="x" and b="c" and bs="y"]}\hspace{10mm}
+%@ {thm[mode=Rule] emb1[where as="x" and b="c" and bs="y"]}\hspace{10mm}
-@{thm[mode=Rule] emb2[where as="x" and a="c" and bs="y"]}
+%@ {thm[mode=Rule] emb2[where as="x" and a="c" and bs="y"]}
+%\end{center}
+\begin{center}
+@{thm[mode=Axiom] emb0}\hspace{10mm}
+@{thm[mode=Rule] emb1}\hspace{10mm}
+@{thm[mode=Rule] emb2}
 \end{center}
 \noindent
 It is straightforward to prove that @{text "\<preceq>"} is a partial order. Now define the
 \emph{language of substrings} and \emph{superstrings} of a language @{text A}
 \end{center}
 \noindent
 We like to establish
-\begin{thrm}[(Haines \cite{Haines69})]\label{subseqreg}
+\begin{thrm}[\cite{Haines69}]\label{subseqreg}
 For every language @{text A}, the languages @{text "(i)"} @{term "SUBSEQ A"} and
 @{text "(ii)"} @{term "SUPSEQ A"}
 are regular.
 \end{thrm}
 \noindent
-Our proof follows the one given in \cite[Pages 92--95]{Shallit08}, except that we use
+Our proof follows the one given by \citeN[Pages 92--95]{Shallit08}, except that we use
-Higman's Lemma, which is already proved in the Isabelle/HOL library
+Higman's Lemma, which is already proved in the Isabelle/HOL library by
-\cite{Berghofer03}.\footnote{Unfortunately, Berghofer's formalisation of Higman's Lemma
+Sternagel.
-is restricted to 2-letter alphabets,
-which means also our formalisation of Theorem~\ref{subseqreg} is `tainted' with
-this constraint. However our methodology is applicable to any alphabet of finite size.}
 Higman's Lemma allows us to infer that every language @{text A} of antichains, satisfying
+%
 \begin{equation}\label{higman}
 @{text "\<forall>x, y \<in> A."}~@{term "x \<noteq> y \<longrightarrow> \<not>(x \<preceq> y) \<and> \<not>(y \<preceq> x)"}
 \end{equation}
 \noindent
 is finite.
 The first step in our proof of Theorem~\ref{subseqreg} is to establish the
 following simple properties for @{term SUPSEQ}
+%
 \begin{equation}\label{supseqprops}
 \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
 @{thm (lhs) SUPSEQ_simps(1)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(1)}\\
 @{thm (lhs) SUPSEQ_simps(2)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(2)}\\
 @{thm (lhs) SUPSEQ_atom} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_atom}\\
 the other direction we have  @{term "x \<in> SUPSEQ A"}. From this we obtain
 a @{text y} such that @{term "y \<in> A"} and @{term "y \<preceq> x"}. Since we have that
 the relation \mbox{@{term "{(y, x). y \<preceq> x \<and> x \<noteq> y}"}} is well-founded, there must
 be a minimal element @{text "z"} such that @{term "z \<in> A"} and @{term "z \<preceq> y"},
 and hence by transitivity also \mbox{@{term "z \<preceq> x"}} (here we deviate from the argument
-given in \cite{Shallit08}, because Isabelle/HOL provides already an extensive infrastructure
+given by \citeN{Shallit08}, because Isabelle/HOL provides already an extensive infrastructure
 for reasoning about well-foundedness). Since @{term "z"} is
 minimal and an element in @{term A}, we also know that @{term z} is in @{term M}.
 From this together with \mbox{@{term "z \<preceq> x"}}, we can infer that @{term x} is in
 @{term "SUPSEQ M"}, as required.
 \end{proof}
 which establishes the second part.
 \end{proof}
 \noindent
 In order to establish the first part of this theorem, we use the
-property proved in \cite{Shallit08}, namely that
+property proved by \citeN{Shallit08}, namely that
+%
 \begin{equation}\label{compl}
 @{thm SUBSEQ_complement}
 \end{equation}
 \noindent
 expressions can conveniently be defined as a datatype in theorem
 provers. For us it was therefore interesting to find out how far we can push
 this point of view. We have established in Isabelle/HOL both directions
 of the Myhill-Nerode Theorem.
 %
-\begin{thrm}[(The Myhill-Nerode Theorem)]\mbox{}\\
+\begin{thrm}[(Myhill-Nerode Theorem)]\mbox{}\\
 A language @{text A} is regular if and only if @{thm (rhs) Myhill_Nerode}.
 \end{thrm}
 \noindent
 Having formalised this theorem means we pushed our point of view quite
 far. Using this theorem we can obviously prove when a language is \emph{not}
 regular---by establishing that it has infinitely many equivalence classes
 generated by the Myhill-Nerode Relation (this is usually the purpose of the
-Pumping Lemma \cite{Kozen97}).  We can also use it to establish the standard
+Pumping Lemma).  We can also use it to establish the standard
 textbook results about closure properties of regular languages. Interesting
 is the case of closure under complement, because it seems difficult to
 construct a regular expression for the complement language by direct
 means. However the existence of such a regular expression can be easily
 proved using the Myhill-Nerode Theorem.
-Our insistence on regular expressions for proving the Myhill-Nerode Theorem
+%Our insistence on regular expressions for proving the Myhill-Nerode Theorem
-arose from the problem of defining formally the regularity of a language.
+%arose from the problem of defining formally the regularity of a language.
-In order to guarantee consistency,
+%In order to guarantee consistency,
-formalisations in HOL can only extend the logic with definitions that introduce a new concept in
+%formalisations in HOL can only extend the logic with definitions that introduce a new concept in
-terms of already existing notions. A convenient definition for automata
+%terms of already existing notions. A convenient definition for automata
-(based on graphs) uses a polymorphic type for the state nodes. This allows
+%(based on graphs) uses a polymorphic type for the state nodes. This allows
-us to use the standard operation for disjoint union whenever we need to compose two
+%us to use the standard operation for disjoint union whenever we need to compose two
-automata. Unfortunately, we cannot use such a polymorphic definition
+%automata. Unfortunately, we cannot use such a polymorphic definition
-in HOL as part of the definition for regularity of a language (a predicate
+%in HOL as part of the definition for regularity of a language (a predicate
-over sets of strings).  Consider for example the following attempt:
+%over sets of strings).  Consider for example the following attempt:
+%
-\begin{center}
+%\begin{center}
-@{text "is_regular A \<equiv> \<exists>M(\<alpha>). is_dfa (M) \<and> \<calL>(M) = A"}
+%@{text "is_regular A \<equiv> \<exists>M(\<alpha>). is_dfa (M) \<and> \<calL>(M) = A"}
-\end{center}
+%\end{center}
+%
-\noindent
+%\noindent
-In this definifion, the definiens is polymorphic in the type of the automata
+%In this definifion, the definiens is polymorphic in the type of the automata
-@{text "M"} (indicated by dependency on the type-variable @{text "\<alpha>"}), but the definiendum
+%@{text "M"} (indicated by dependency on the type-variable @{text "\<alpha>"}), but the definiendum
-@{text "is_regular"} is not. Such definitions are excluded from HOL, because
+%@{text "is_regular"} is not. Such definitions are excluded from HOL, because
-they can lead easily to inconsistencies (see \cite{PittsHOL4} for a simple
+%they can lead easily to inconsistencies (see \cite{PittsHOL4} for a simple
-example). Also HOL does not contain type-quantifiers which would allow us to
+%example). Also HOL does not contain type-quantifiers which would allow us to
-get rid of the polymorphism by quantifying over the type-variable @{text
+%get rid of the polymorphism by quantifying over the type-variable
-"\<alpha>"}. Therefore when defining regularity in terms of automata, the
+%@{text "\<alpha>"}. Therefore when defining regularity in terms of automata, the
-natural way out in HOL is to resort to state nodes with an identity, for
+%natural way out in HOL is to resort to state nodes with an identity, for
-example a natural number. Unfortunatly, the consequence is that we have to
+%example a natural number. Unfortunatly, the consequence is that we have to
-be careful when combining two automata so that there is no clash between two
+%be careful when combining two automata so that there is no clash between two
-such states. This makes formalisations quite fiddly and rather
+%such states. This makes formalisations quite fiddly and rather
-unpleasant. Regular expressions proved much more convenient for reasoning in
+%unpleasant. Regular expressions proved much more convenient for reasoning in
-HOL since they can be defined as inductive datatype and a reasoning
+%HOL since they can be defined as inductive datatype and a reasoning
-infrastructure comes for free. The definition of regularity in terms of
+%infrastructure comes for free. The definition of regularity in terms of
-regular expressions poses no problem at all for HOL.  We showed in this
+%regular expressions poses no problem at all for HOL.  We showed in this
-paper that they can be used for establishing the central result in regular
+%paper that they can be used for establishing the central result in regular
-language theory---the Myhill-Nerode Theorem.
+%language theory---the Myhill-Nerode Theorem.
 While regular expressions are convenient, they have some limitations. One is
 that there seems to be no method of calculating a minimal regular expression
 (for example in terms of length) for a regular language, like there is for
 automata. On the other hand, efficient regular expression matching, without
-using automata, poses no problem \cite{OwensReppyTuron09}.  For an
+using automata, poses no problem as shown by \citeN{OwensReppyTuron09}.  For an
 implementation of a simple regular expression matcher, whose correctness has
-been formally established, we refer the reader to Owens and Slind
+been formally established, we refer the reader to
-\cite{OwensSlind08}. In our opinion, their formalisation is considerably
+\citeN{OwensSlind08}. In our opinion, their formalisation is considerably
 slicker than for example the approach to regular expression matching taken
-in \cite{Harper99} and \cite{Yi06}.
+by \citeN{Harper99} and by \citeN{Yi06}.
 Our proof of the first direction is very much inspired by \emph{Brzozowski's
-algebraic method} used to convert a finite automaton to a regular expression
+algebraic method} \citeyear{Brzozowski64} used to convert a finite automaton to a regular expression.
-\cite{Brzozowski64}. The close connection can be seen by considering the
+The close connection can be seen by considering the
 equivalence classes as the states of the minimal automaton for the regular
 language.  However there are some subtle differences. Because our equivalence
 classes (or correspondingly states) arise from the Myhill-Nerode Relation, the most natural
 choice is to characterise each state with the set of strings starting from
 the initial state leading up to that state. Usually, however, the states are
 We presented two proofs for the second direction of the Myhill-Nerode
 Theorem. One direct proof using tagging-functions and another using partial
 derivatives. This part of our work is where our method using regular
 expressions shines, because we can completely side-step the standard
-argument \cite{Kozen97} where automata need to be composed. However, it is
+argument (for example used by \citeN{Kozen97}) where automata need to be composed. However, it is
 also the direction where we had to spend most of the `conceptual' time, as
 our first proof based on tagging-functions is new for establishing the
 Myhill-Nerode Theorem. All standard proofs of this direction proceed by
 arguments over automata.
 The indirect proof for the second direction arose from our interest in
-Brzozowski's derivatives for regular expression matching.  While Brzozowski
+Brzozowski's derivatives for regular expression matching.  While \citeN{Brzozowski64}
 already established that there are only
 finitely many dissimilar derivatives for every regular expression, this
 result is not as straightforward to formalise in a theorem prover as one
 might wish. The reason is that the set of dissimilar derivatives is not
 defined inductively, but in terms of an ACI-equivalence relation. This
-difficulty prevented for example Krauss and Nipkow to prove termination of
+difficulty prevented for example \citeN{KraussNipkow11} to prove termination of
-their equivalence checker for regular expressions
+their equivalence checker for regular expressions. Their checker is based on Brzozowski's derivatives
-\cite{KraussNipkow11}. Their checker is based on Brzozowski's derivatives
 and for their argument the lack of a formal proof of termination is not
 crucial (it merely lets them ``sleep better'' \cite{KraussNipkow11}).  We
 expect that their development simplifies by using partial derivatives,
 instead of derivatives, and that the termination of the algorithm can be
 formally established (the main ingredient is
 Theorem~\ref{antimirov}). However, since partial derivatives use sets of
 regular expressions, one needs to carefully analyse whether the resulting
-algorithm is still executable. Given the existing infrastructure for
+algorithm is still executable. Given the infrastructure for
-executable sets in Isabelle/HOL \cite{Haftmann09}, it should.
+executable sets  introduced by \citeN{Haftmann09} in Isabelle/HOL, it should.
 Our formalisation of the Myhill-Nerode Theorem consists of 780 lines of
 Isabelle/Isar code for the first direction and 460 for the second (the one
 based on tagging-functions), plus around 300 lines of standard material
 about regular languages. The formalisation of derivatives and partial
 derivatives shown in Section~\ref{derivatives} consists of 390 lines of
 code.  The closure properties in Section~\ref{closure} (except Theorem~\ref{subseqreg})
 can be established in
-190 lines of code. The Continuation Lemma and the non-regularity of @{text "a\<^sup>n b\<^sup>n"}
+100 lines of code. The Continuation Lemma and the non-regularity of @{text "a\<^sup>n b\<^sup>n"}
 require 70 lines of code.
 The algorithm for solving equational systems, which we
 used in the first direction, is conceptually relatively simple. Still the
 use of sets over which the algorithm operates means it is not as easy to
 formalise as one might hope. However, it seems sets cannot be avoided since
 the `input' of the algorithm consists of equivalence classes and we cannot
 see how to reformulate the theory so that we can use lists or matrices. Lists would be
 much easier to reason about, since we can define functions over them by
 recursion. For sets we have to use set-comprehensions, which is slightly
 unwieldy. Matrices would allow us to use the slick formalisation by
-Nipkow of the Gauss-Jordan algorithm \cite{Nipkow11}.
+\citeN{Nipkow11} of the Gauss-Jordan algorithm.
 While our formalisation might appear large, it should be seen
-in the context of the work done by Constable at al \cite{Constable00} who
+in the context of the work done by \citeN{Constable00} who
 formalised the Myhill-Nerode Theorem in Nuprl using automata. They write
 that their four-member team would need something on the magnitude of 18 months
-for their formalisation of the first eleven chapters of \cite{HopcroftUllman69},
+for their formalisation of the first eleven chapters of the textbook by \citeN{HopcroftUllman69},
 which includes the Myhill-Nerode theorem. It is hard to gauge the size of a
 formalisation in Nurpl, but from what is shown in the Nuprl Math Library
 about their development it seems substantially larger than ours. We attribute
 this to our use of regular expressions, which meant we did not need to `fight'
 the theorem prover.
-Also, Filli\^atre reports that his formalisation in
+Also, \citeN{Filliatre97} reports that his formalisation in
-Coq of automata theory and Kleene's theorem is ``rather big''
+Coq of automata theory and Kleene's theorem is ``rather big''.
-\cite{Filliatre97}. More recently, Almeida et al reported about another
+More recently, \citeN{Almeidaetal10}  reported about another
-formalisation of regular languages in Coq \cite{Almeidaetal10}. Their
+formalisation of regular languages in Coq. Their
 main result is the
 correctness of Mirkin's construction of an automaton from a regular
 expression using partial derivatives. This took approximately 10600 lines
-of code.  Also Braibant formalised a large part of regular language
+of code.  Also \citeN{Braibant12} formalised a large part of regular language
-theory and Kleene algebras in Coq \cite{Braibant12}. While he is mainly interested
+theory and Kleene algebras in Coq. While he is mainly interested
 in implementing decision procedures for Kleene algebras, his library
 includes a proof of the Myhill-Nerode theorem. He reckons that our
 ``development is more concise'' than his one based on matrices \cite[Page 67]{Braibant12}.
 In terms of time, the estimate for our formalisation is that we
 needed approximately 3 months and this included the time to find our proof
-arguments. Unlike Constable et al, who were able to follow the Myhill-Nerode
+arguments. Unlike \citeN{Constable00}, who were able to follow the Myhill-Nerode
-proof from \cite{HopcroftUllman69}, we had to find our own arguments.  So for us the
+proof by \citeN{HopcroftUllman69}, we had to find our own arguments.  So for us the
 formalisation was not the bottleneck.  The code of
 our formalisation can be found in the Archive of Formal Proofs at
 \mbox{\url{http://afp.sourceforge.net/entries/Myhill-Nerode.shtml}}
-\cite{myhillnerodeafp11}.\medskip
+\cite{myhillnerodeafp11}.\smallskip
 \noindent
 {\bf Acknowledgements:}
 We are grateful for the comments we received from Larry Paulson.  Tobias
 Nipkow made us aware of the properties in Theorem~\ref{subseqreg} and Tjark
-Weber helped us with proving them.
+Weber helped us with proving them. Christian Sternagel provided us with a
+version of Higman's Lemma that applies to arbtrary, but finite alphabets.
-\bibliographystyle{plain}
+\bibliographystyle{acmtrans}
 \bibliography{root}
 \newpage
 \begin{appendix}
 \section{Appendix$^\star$}

changeset 350	8ce9a432680b
parent 348	bea94f1e6771
child 372	2c56b20032a7