--- a/Journal/Paper.thy Wed Aug 17 07:43:09 2011 +0000
+++ b/Journal/Paper.thy Wed Aug 17 17:36:19 2011 +0000
@@ -157,10 +157,10 @@
limitations that hurt badly, if one attempts a simple-minded formalisation
of regular languages in it.
- The typical approach to regular languages is to
- introduce finite automata and then define everything in terms of them
- \cite{ HopcroftUllman69,Kozen97}. For example, a regular language is
- normally defined as:
+ The typical approach to regular languages is to introduce finite
+ deterministic automata and then define everything in terms of them \cite{
+ HopcroftUllman69,Kozen97}. For example, a regular language is normally
+ defined as:
\begin{dfntn}\label{baddef}
A language @{text A} is \emph{regular}, provided there is a
@@ -252,13 +252,14 @@
\noindent
changes the type---the disjoint union is not a set, but a set of
pairs. Using this definition for disjoint union means we do not have a
- single type for the states of automata. As a result we will not be able to define a
- regular language as one for which there exists an automaton that recognises
- all its strings (Definition~\ref{baddef}). This is because we cannot make a
- definition in HOL that is only polymorphic in the state type and there is no type
- quantification available in HOL (unlike in Coq, for example).\footnote{Slind
- already pointed out this problem in an email to the HOL4 mailing list on
- 21st April 2005.}
+ single type for the states of automata. As a result we will not be able to
+ define in our fomalisation a regular language as one for which there exists
+ an automaton that recognises all its strings (Definition~\ref{baddef}). This
+ is because we cannot make a definition in HOL that is only polymorphic in
+ the state type, but not in the predicate for regularity; and there is no
+ type quantification available in HOL (unlike in Coq, for
+ example).\footnote{Slind already pointed out this problem in an email to the
+ HOL4 mailing list on 21st April 2005.}
An alternative, which provides us with a single type for states of automata,
is to give every state node an identity, for example a natural number, and
@@ -268,12 +269,15 @@
matrices results in very adhoc constructions, which are not pleasant to
reason about.
- Functions are much better supported in Isabelle/HOL, but they still lead to similar
- problems as with graphs. Composing, for example, two non-deterministic automata in parallel
- requires also the formalisation of disjoint unions. Nipkow \cite{Nipkow98}
- dismisses for this the option of using identities, because it leads according to
- him to ``messy proofs''. Since he does not need to define what regular
- languages are, Nipkow opts for a variant of \eqref{disjointunion} using bit lists, but writes
+ Functions are much better supported in Isabelle/HOL, but they still lead to
+ similar problems as with graphs. Composing, for example, two
+ non-deterministic automata in parallel requires also the formalisation of
+ disjoint unions. Nipkow \cite{Nipkow98} dismisses for this the option of
+ using identities, because it leads according to him to ``messy
+ proofs''. Since he does not need to define what regular languages are,
+ Nipkow opts for a variant of \eqref{disjointunion} using bit lists, but
+ writes\smallskip
+
\begin{quote}
\it%
@@ -281,10 +285,10 @@
`` & All lemmas appear obvious given a picture of the composition of automata\ldots
Yet their proofs require a painful amount of detail.''
\end{tabular}
- \end{quote}
+ \end{quote}\smallskip
\noindent
- and
+ and\smallskip
\begin{quote}
\it%
@@ -292,7 +296,7 @@
`` & If the reader finds the above treatment in terms of bit lists revoltingly
concrete, I cannot disagree. A more abstract approach is clearly desirable.''
\end{tabular}
- \end{quote}
+ \end{quote}\smallskip
\noindent
@@ -1315,10 +1319,10 @@
\begin{center}
\begin{tabular}{c}
\scalebox{1}{
- \begin{tikzpicture}
- \node[draw,minimum height=3.8ex] (x) { $\hspace{4.8em}@{text x}\hspace{4.8em}$ };
- \node[draw,minimum height=3.8ex, right=-0.03em of x] (za) { $\hspace{0.6em}@{text "z\<^isub>p"}\hspace{0.6em}$ };
- \node[draw,minimum height=3.8ex, right=-0.03em of za] (zza) { $\hspace{2.6em}@{text "z\<^isub>s"}\hspace{2.6em}$ };
+ \begin{tikzpicture}[fill=gray!20]
+ \node[draw,minimum height=3.8ex, fill] (x) { $\hspace{4.8em}@{text x}\hspace{4.8em}$ };
+ \node[draw,minimum height=3.8ex, right=-0.03em of x, fill] (za) { $\hspace{0.6em}@{text "z\<^isub>p"}\hspace{0.6em}$ };
+ \node[draw,minimum height=3.8ex, right=-0.03em of za, fill] (zza) { $\hspace{2.6em}@{text "z\<^isub>s"}\hspace{2.6em}$ };
\draw[decoration={brace,transform={yscale=3}},decorate]
(x.north west) -- ($(za.north west)+(0em,0em)$)
@@ -1342,10 +1346,10 @@
\end{tikzpicture}}
\\[2mm]
\scalebox{1}{
- \begin{tikzpicture}
- \node[draw,minimum height=3.8ex] (xa) { $\hspace{3em}@{text "x\<^isub>p"}\hspace{3em}$ };
- \node[draw,minimum height=3.8ex, right=-0.03em of xa] (xxa) { $\hspace{0.2em}@{text "x\<^isub>s"}\hspace{0.2em}$ };
- \node[draw,minimum height=3.8ex, right=-0.03em of xxa] (z) { $\hspace{5em}@{text z}\hspace{5em}$ };
+ \begin{tikzpicture}[fill=gray!20]
+ \node[draw,minimum height=3.8ex, fill] (xa) { $\hspace{3em}@{text "x\<^isub>p"}\hspace{3em}$ };
+ \node[draw,minimum height=3.8ex, right=-0.03em of xa, fill] (xxa) { $\hspace{0.2em}@{text "x\<^isub>s"}\hspace{0.2em}$ };
+ \node[draw,minimum height=3.8ex, right=-0.03em of xxa, fill] (z) { $\hspace{5em}@{text z}\hspace{5em}$ };
\draw[decoration={brace,transform={yscale=3}},decorate]
(xa.north west) -- ($(xxa.north east)+(0em,0em)$)
@@ -1458,11 +1462,11 @@
\begin{center}
\scalebox{1}{
- \begin{tikzpicture}
- \node[draw,minimum height=3.8ex] (xa) { $\hspace{4em}@{text "x\<^bsub>pmax\<^esub>"}\hspace{4em}$ };
- \node[draw,minimum height=3.8ex, right=-0.03em of xa] (xxa) { $\hspace{0.5em}@{text "x\<^bsub>s\<^esub>"}\hspace{0.5em}$ };
- \node[draw,minimum height=3.8ex, right=-0.03em of xxa] (za) { $\hspace{2em}@{text "z\<^isub>a"}\hspace{2em}$ };
- \node[draw,minimum height=3.8ex, right=-0.03em of za] (zb) { $\hspace{7em}@{text "z\<^isub>b"}\hspace{7em}$ };
+ \begin{tikzpicture}[fill=gray!20]
+ \node[draw,minimum height=3.8ex, fill] (xa) { $\hspace{4em}@{text "x\<^bsub>pmax\<^esub>"}\hspace{4em}$ };
+ \node[draw,minimum height=3.8ex, right=-0.03em of xa, fill] (xxa) { $\hspace{0.5em}@{text "x\<^bsub>s\<^esub>"}\hspace{0.5em}$ };
+ \node[draw,minimum height=3.8ex, right=-0.03em of xxa, fill] (za) { $\hspace{2em}@{text "z\<^isub>a"}\hspace{2em}$ };
+ \node[draw,minimum height=3.8ex, right=-0.03em of za, fill] (zb) { $\hspace{7em}@{text "z\<^isub>b"}\hspace{7em}$ };
\draw[decoration={brace,transform={yscale=3}},decorate]
(xa.north west) -- ($(xxa.north east)+(0em,0em)$)
@@ -1654,13 +1658,11 @@
follows.
\begin{center}
- \begin{tabular}{c@ {\hspace{10mm}}c}
+ \begin{tabular}{c}
\begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
@{thm (lhs) nullable.simps(1)} & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(1)}\\
@{thm (lhs) nullable.simps(2)} & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(2)}\\
@{thm (lhs) nullable.simps(3)} & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(3)}\\
- \end{tabular} &
- \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
@{thm (lhs) nullable.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}
& @{text "\<equiv>"} & @{thm (rhs) nullable.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
@{thm (lhs) nullable.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}
@@ -1685,8 +1687,15 @@
\noindent
The importance of this fact in the context of the Myhill-Nerode theorem is that
we can use \eqref{mhders} and \eqref{Dersders} in order to
- establish that @{term "x \<approx>(lang r) y"} is equivalent to
- \mbox{@{term "lang (ders x r) = lang (ders y r)"}}. Hence
+ establish that
+
+ \begin{center}
+ @{term "x \<approx>(lang r) y"} \hspace{4mm}if and only if\hspace{4mm}
+ @{term "lang (ders x r) = lang (ders y r)"}.
+ \end{center}
+
+ \noindent
+ holds and hence
\begin{equation}
@{term "x \<approx>(lang r) y"}\hspace{4mm}\mbox{provided}\hspace{4mm}@{term "ders x r = ders y r"}
@@ -1694,7 +1703,7 @@
\noindent
- which means the right-hand side (seen as a relation) refines the Myhill-Nerode
+ This means the right-hand side (seen as a relation) refines the Myhill-Nerode
relation. Consequently, we can use @{text
"\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as a
tagging-relation. However, in order to be useful for the second part of the
@@ -1840,7 +1849,7 @@
of derivatives already built in.
Antimirov also proved that for every language and regular expression
- there are only finitely many partial derivatives, whereby the partial
+ there are only finitely many partial derivatives, whereby the set of partial
derivatives of @{text r} w.r.t.~a language @{text A} is defined as
\begin{equation}\label{Pdersdef}
@@ -1853,7 +1862,7 @@
\end{thrm}
\noindent
- Antimirov's proof first shows this theorem for the language @{term UNIV1},
+ Antimirov's proof first establishes this theorem for the language @{term UNIV1},
which is the set of all non-empty strings. For this he proves:
\begin{equation}\label{pdersunivone}
@@ -1885,10 +1894,10 @@
and for all languages @{text "A"}, @{term "pders_lang A r"} is a subset of
@{term "pders_lang UNIV r"}. Since we follow Antimirov's proof quite
closely in our formalisation (only the last two cases of
- \eqref{pdersunivone} involve some non-routine induction argument), we omit
+ \eqref{pdersunivone} involve some non-routine induction arguments), we omit
the details.
- Let us now return to our proof about the second direction in the Myhill-Nerode
+ Let us now return to our proof for the second direction in the Myhill-Nerode
theorem. The point of the above calculations is to use
@{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as tagging-relation.
@@ -1915,14 +1924,14 @@
\noindent
Now the range of @{term "\<lambda>x. pders x r"} is a subset of @{term "Pow (pders_lang UNIV r)"},
- which we know is finite by Theorem~\ref{antimirov}. This means there
+ which we know is finite by Theorem~\ref{antimirov}. Consequently there
are only finitely many equivalence classes of @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"},
- which refines @{term "\<approx>(lang r)"}, and consequently we can again conclude the
+ which refines @{term "\<approx>(lang r)"}, and therefore we can again conclude the
second part of the Myhill-Nerode theorem.
\end{proof}
*}
-section {* Closure Properties of Regular Languages *}
+section {* Closure Properties of Regular Languages\label{closure} *}
text {*
\noindent
@@ -1960,7 +1969,8 @@
\end{center}
\noindent
- Closure of regular languages under reversal, that is
+ Since all finite languages are regular, then by closure under complement also
+ all co-finite languages. Closure of regular languages under reversal, that is
\begin{center}
@{text "A\<^bsup>-1\<^esup> \<equiv> {s\<^bsup>-1\<^esup> | s \<in> A}"}
@@ -2051,10 +2061,10 @@
proved using the Myhill-Nerode theorem.
Our insistence on regular expressions for proving the Myhill-Nerode theorem
- arose from the limitations of HOL, on which the popular theorem provers HOL4,
- HOLlight and Isabelle/HOL are based. In order to guarantee consistency,
- formalisations can extend HOL with definitions that introduce a new concept in
- only terms of already existing notions. A convenient definition for automata
+ arose from the limitations of HOL, used in the popular theorem provers HOL4,
+ HOLlight and Isabelle/HOL. In order to guarantee consistency,
+ formalisations in HOL can only extend the logic with definitions that introduce a new concept in
+ terms of already existing notions. A convenient definition for automata
(based on graphs) uses a polymorphic type for the state nodes. This allows
us to use the standard operation for disjoint union whenever we need to compose two
automata. Unfortunately, we cannot use such a polymorphic definition
@@ -2062,14 +2072,14 @@
over set of strings). Consider the following attempt:
\begin{center}
- @{text "is_regular A \<equiv> \<exists>M(\<alpha>). is_finite_automata (M) \<and> \<calL>(M) = A"}
+ @{text "is_regular A \<equiv> \<exists>M(\<alpha>). is_dfa (M) \<and> \<calL>(M) = A"}
\end{center}
\noindent
In this definifion, the definiens is polymorphic in the type of the automata
- @{text "M"} (indicated by the @{text "\<alpha>"}), but the definiendum @{text
- "is_regular"} is not. Such definitions are excluded in HOL, because they can
- lead easily to inconsistencies (see \cite{PittsHOL4} for a simple
+ @{text "M"} (indicated by dependency on @{text "\<alpha>"}), but the definiendum
+ @{text "is_regular"} is not. Such definitions are excluded in HOL, because
+ they can lead easily to inconsistencies (see \cite{PittsHOL4} for a simple
example). Also HOL does not contain type-quantifiers which would allow us to
get rid of the polymorphism by quantifying over the type-variable @{text
"\<alpha>"}. Therefore when defining regularity in terms of automata, the only
@@ -2079,8 +2089,10 @@
such states. This makes formalisations quite fiddly and rather
unpleasant. Regular expressions proved much more convenient for reasoning in
HOL since they can be defined as inductive datatype and a reasoning
- infrastructure comes for free. We showed they can be used for establishing
- the central result in regular language theory---the Myhill-Nerode theorem.
+ infrastructure comes for free. The definition of regularity in terms of
+ regular expressions poses no problem at all for HOL. We showed in this
+ paper that they can be used for establishing the central result in regular
+ language theory---the Myhill-Nerode theorem.
While regular expressions are convenient, they have some limitations. One is
that there seems to be no method of calculating a minimal regular expression
@@ -2141,20 +2153,21 @@
algorithm is still executable. Given the existing infrastructure for
executable sets in Isabelle/HOL, it should.
-
Our formalisation of the Myhill-Nerode theorem consists of 780 lines of
Isabelle/Isar code for the first direction and 460 for the second (the one
- based on tagging functions), plus around 300 lines of standard material
+ based on tagging-functions), plus around 300 lines of standard material
about regular languages. The formalisation of derivatives and partial
derivatives shown in Section~\ref{derivatives} consists of 390 lines of
- code. The algorithm for solving equational systems, which we used in the
- first direction, is conceptually relatively simple. Still the use of sets
- over which the algorithm operates means it is not as easy to formalise as
- one might hope. However, it seems sets cannot be avoided since the `input'
- of the algorithm consists of equivalence classes and we cannot see how to
- reformulate the theory so that we can use lists. Lists would be much easier
- to reason about, since we can define functions over them by recursion. For
- sets we have to use set-comprehensions, which is slightly unwieldy.
+ code. The closure properties in Section~\ref{closure} can be established in
+ 190 lines of code. The algorithm for solving equational systems, which we
+ used in the first direction, is conceptually relatively simple. Still the
+ use of sets over which the algorithm operates means it is not as easy to
+ formalise as one might hope. However, it seems sets cannot be avoided since
+ the `input' of the algorithm consists of equivalence classes and we cannot
+ see how to reformulate the theory so that we can use lists. Lists would be
+ much easier to reason about, since we can define functions over them by
+ recursion. For sets we have to use set-comprehensions, which is slightly
+ unwieldy.
While our formalisation might appear large, it should be seen
in the context of the work done by Constable at al \cite{Constable00} who