--- a/Journal/Paper.thy Mon Jul 25 18:00:52 2011 +0000
+++ b/Journal/Paper.thy Tue Jul 26 10:58:26 2011 +0000
@@ -29,16 +29,14 @@
"_Collect p P" <= "{p|xs. P}"
"_CollectIn p A P" <= "{p : A. P}"
-abbreviation "NULL \<equiv> Zero"
-abbreviation "EMPTY \<equiv> One"
-abbreviation "CHAR \<equiv> Atom"
-abbreviation "ALT \<equiv> Plus"
-abbreviation "SEQ \<equiv> Times"
+abbreviation "ZERO \<equiv> Zero"
+abbreviation "ONE \<equiv> One"
+abbreviation "ATOM \<equiv> Atom"
+abbreviation "PLUS \<equiv> Plus"
+abbreviation "TIMES \<equiv> Times"
abbreviation "STAR \<equiv> Star"
-ML {* @{term "op ^^"} *}
-
notation (latex output)
str_eq_rel ("\<approx>\<^bsub>_\<^esub>") and
str_eq ("_ \<approx>\<^bsub>_\<^esub> _") and
@@ -132,11 +130,11 @@
(e.g.~\cite{Kozen97, HopcroftUllman69}). It seems natural to exercise theorem provers by
formalising the theorems and by verifying formally the algorithms. A
popular choice for a theorem prover would be one based on Higher-Order Logic
- (HOL), such as HOL4, HOLlight and Isabelle/HOL. For our development we will
- use the latter theorem prover. One distinguishing feature of HOL is it's
- type system based on Church's Simple Theory of Types \cite{Church40}. The
- limitations of this type system are one of the main motivations
- behind the work presented in this paper.
+ (HOL), for example HOL4, HOLlight and Isabelle/HOL. For our development
+ we will use the latter. One distinguishing feature of HOL is it's
+ type system, which is based on Church's Simple Theory of Types \cite{Church40}. The
+ limitations of this type system are one of the underlying motivations for the
+ work presented in this paper.
The typical approach to regular languages is to
introduce finite automata and then define everything in terms of them. For
@@ -225,12 +223,13 @@
\end{equation}
\noindent
- changes the type---the disjoint union is not a set, but a set of pairs.
- Using this definition for disjoint union means we do not have a single type for automata
- and hence will not be able to state certain properties about \emph{all}
- automata, since there is no type quantification available in HOL (unlike in Coq, for example).
- As a result, we would not be able to define a language being regular
- in terms of the existence of an automata.
+ changes the type---the disjoint union is not a set, but a set of
+ pairs. Using this definition for disjoint union means we do not have a
+ single type for automata. As a result we will not be able to define a regular
+ language as one for which there exists an automaton that recognises all its
+ strings. Similarly, we cannot state properties about \emph{all} automata,
+ since there is no type quantification available in HOL (unlike in Coq, for
+ example).
An alternative, which provides us with a single type for automata, is to give every
state node an identity, for example a natural
@@ -283,24 +282,24 @@
larger formalisations of automata theory are carried out in Nuprl
\cite{Constable00} and in Coq \cite{Filliatre97}.
- Also one might consider the Myhill-Nerode theorem as well-worn stock
- material for a computer scientists, but paper proofs of this theorem contain some
- subtle side-conditions which are easily overlooked and which make formal reasoning
- painful. For example in Kozen's proof \cite{Kozen97} it is not sufficient to
- have just any automata recognising a regular language, but one which does
- not have inaccessible states. This means if we define a regular language
- for which \emph{any} finite automaton exists, then one has to ensure that
- another equivalent can be found satisfying the side-conditions. Similarly
- Brozowski mentiones in \cite{Brozowski10} side-conditions of finite automata
- in connection of state-complexity: there it is required that automata
- must be complete in the sense of having a total transition function.
- Furthermore, if a `sink' state is present which accepts no word, then there
- must be only one such state. . . .
- Such 'little' lemmas make formalisation of these results in a theroem
- prover hair-pulling experiences.
+ One might also consider the Myhill-Nerode theorem as well-worn stock
+ material where everything is clear. However, paper proofs of this theorem
+ often involve subtle side-conditions which are easily overlooked, but which
+ make formal reasoning rather painful. For example Kozen's proof requires
+ that the automata do not have inaccessible states \cite{Kozen97}. Another
+ subtle side-condition is completeness of automata:
+ automata need to have total transition functions and at most one `sink'
+ state from which there is no connection to a final state (Brozowski mentions
+ this side-condition in connection with state complexity
+ \cite{Brozowski10}). Such side-conditions mean that if we define a regular
+ language as one for which there exists \emph{any} finite automaton, then we
+ need a lemma which ensures that another equivalent can be found satisfying the
+ side-condition. Unfortunately, such `little' and `obvious' lemmas make
+ formalisations of results in automata theory hair-pulling experiences.
+
In this paper, we will not attempt to formalise automata theory in
- Isabelle/HOL nor attempt to formalise any automata proof from the
+ Isabelle/HOL nor will we attempt to formalise automata proofs from the
literature, but take a different approach to regular languages than is
usually taken. Instead of defining a regular language as one where there
exists an automaton that recognises all strings of the language, we define a
@@ -313,9 +312,10 @@
\noindent
The reason is that regular expressions, unlike graphs, matrices and
- functions, can be easily defined as inductive datatype. Consequently a
- corresponding reasoning infrastructure (like induction or recursion) comes
- for free. This has recently been exploited in HOL4 with a formalisation of
+ functions, can be easily defined as an inductive datatype. No side-conditions
+ will be needed for regular expressions. Moreover, a reasoning infrastructure
+ (like induction and recursion) comes for free in HOL-based theorem provers.
+ This has recently been exploited in HOL4 with a formalisation of
regular expression matching based on derivatives \cite{OwensSlind08} and
with an equivalence checker for regular expressions in Isabelle/HOL
\cite{KraussNipkow11}. The purpose of this paper is to show that a central
@@ -329,13 +329,13 @@
{\bf Contributions:} There is an extensive literature on regular languages.
To our best knowledge, our proof of the Myhill-Nerode theorem is the first
that is based on regular expressions, only. The part of this theorem stating
- that finitely many partitions of a language w.r.t.~to the Myhill-Nerode
- relation imply that the language is regular is proved by a folklore argument
- of solving equational sytems. For the other part we give two proofs: a
+ that finitely many partitions imply regularity of the language is proved by
+ an argument about solving equational sytems. This argument seems to be folklore.
+ For the other part, we give two proofs: a
direct proof using certain tagging-functions, and an indirect proof using
Antimirov's partial derivatives \cite{Antimirov95} (also earlier russion work).
Again to our best knowledge, the tagging-functions have not been used before to
- establish the Myhill-Nerode theorem.
+ establish the Myhill-Nerode theorem.
*}
@@ -432,11 +432,11 @@
\begin{center}
@{text r} @{text "::="}
- @{term NULL}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
- @{term EMPTY}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
- @{term "CHAR c"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
- @{term "SEQ r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
- @{term "ALT r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
+ @{term ZERO}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
+ @{term ONE}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
+ @{term "ATOM c"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
+ @{term "TIMES r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
+ @{term "PLUS r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
@{term "STAR r"}
\end{center}
@@ -466,8 +466,8 @@
a regular expression that matches the union of all languages of @{text rs}. We only need to know the
existence
of such a regular expression and therefore we use Isabelle/HOL's @{const "fold_graph"} and Hilbert's
- @{text "\<epsilon>"} to define @{term "\<Uplus>rs"}. This operation, roughly speaking, folds @{const ALT} over the
- set @{text rs} with @{const NULL} for the empty set. We can prove that for a finite set @{text rs}
+ @{text "\<epsilon>"} to define @{term "\<Uplus>rs"}. This operation, roughly speaking, folds @{const PLUS} over the
+ set @{text rs} with @{const ZERO} for the empty set. We can prove that for a finite set @{text rs}
%
\begin{equation}\label{uplus}
\mbox{@{thm (lhs) folds_alt_simp} @{text "= \<Union> (\<calL> ` rs)"}}
@@ -567,15 +567,15 @@
\begin{center}
\begin{tabular}{rcl}
- @{text "X\<^isub>1"} & @{text "="} & @{text "(Y\<^isub>1\<^isub>1, CHAR c\<^isub>1\<^isub>1) + \<dots> + (Y\<^isub>1\<^isub>p, CHAR c\<^isub>1\<^isub>p) + \<lambda>(EMPTY)"} \\
- @{text "X\<^isub>2"} & @{text "="} & @{text "(Y\<^isub>2\<^isub>1, CHAR c\<^isub>2\<^isub>1) + \<dots> + (Y\<^isub>2\<^isub>o, CHAR c\<^isub>2\<^isub>o)"} \\
+ @{text "X\<^isub>1"} & @{text "="} & @{text "(Y\<^isub>1\<^isub>1, ATOM c\<^isub>1\<^isub>1) + \<dots> + (Y\<^isub>1\<^isub>p, ATOM c\<^isub>1\<^isub>p) + \<lambda>(ONE)"} \\
+ @{text "X\<^isub>2"} & @{text "="} & @{text "(Y\<^isub>2\<^isub>1, ATOM c\<^isub>2\<^isub>1) + \<dots> + (Y\<^isub>2\<^isub>o, ATOM c\<^isub>2\<^isub>o)"} \\
& $\vdots$ \\
- @{text "X\<^isub>n"} & @{text "="} & @{text "(Y\<^isub>n\<^isub>1, CHAR c\<^isub>n\<^isub>1) + \<dots> + (Y\<^isub>n\<^isub>q, CHAR c\<^isub>n\<^isub>q)"}\\
+ @{text "X\<^isub>n"} & @{text "="} & @{text "(Y\<^isub>n\<^isub>1, ATOM c\<^isub>n\<^isub>1) + \<dots> + (Y\<^isub>n\<^isub>q, ATOM c\<^isub>n\<^isub>q)"}\\
\end{tabular}
\end{center}
\noindent
- where the terms @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"}
+ where the terms @{text "(Y\<^isub>i\<^isub>j, ATOM c\<^isub>i\<^isub>j)"}
stand for all transitions @{term "Y\<^isub>i\<^isub>j \<Turnstile>c\<^isub>i\<^isub>j\<Rightarrow>
X\<^isub>i"}.
%The intuition behind the equational system is that every
@@ -584,10 +584,10 @@
%are the @{text "Y\<^isub>i\<^isub>j"}; the @{text "c\<^isub>i\<^isub>j"} are the labels of the transitions from these
%predecessor states to @{text X\<^isub>i}.
There can only be
- finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"} in a right-hand side
+ finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, ATOM c\<^isub>i\<^isub>j)"} in a right-hand side
since by assumption there are only finitely many
equivalence classes and only finitely many characters.
- The term @{text "\<lambda>(EMPTY)"} in the first equation acts as a marker for the initial state, that
+ The term @{text "\<lambda>(ONE)"} in the first equation acts as a marker for the initial state, that
is the equivalence class
containing @{text "[]"}.\footnote{Note that we mark, roughly speaking, the
single `initial' state in the equational system, which is different from
@@ -598,7 +598,7 @@
be reached by adding a character to the end of @{text Y}. This is also the
reason why we have to use our reverse version of Arden's Lemma.}
%In our initial equation system there can only be
- %finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"} in a right-hand side
+ %finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, ATOM c\<^isub>i\<^isub>j)"} in a right-hand side
%since by assumption there are only finitely many
%equivalence classes and only finitely many characters.
Overloading the function @{text \<calL>} for the two kinds of terms in the
@@ -614,14 +614,14 @@
and we can prove for @{text "X\<^isub>2\<^isub>.\<^isub>.\<^isub>n"} that the following equations
%
\begin{equation}\label{inv1}
- @{text "X\<^isub>i = \<calL>(Y\<^isub>i\<^isub>1, CHAR c\<^isub>i\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>i\<^isub>q, CHAR c\<^isub>i\<^isub>q)"}.
+ @{text "X\<^isub>i = \<calL>(Y\<^isub>i\<^isub>1, ATOM c\<^isub>i\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>i\<^isub>q, ATOM c\<^isub>i\<^isub>q)"}.
\end{equation}
\noindent
hold. Similarly for @{text "X\<^isub>1"} we can show the following equation
%
\begin{equation}\label{inv2}
- @{text "X\<^isub>1 = \<calL>(Y\<^isub>1\<^isub>1, CHAR c\<^isub>1\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>1\<^isub>p, CHAR c\<^isub>1\<^isub>p) \<union> \<calL>(\<lambda>(EMPTY))"}.
+ @{text "X\<^isub>1 = \<calL>(Y\<^isub>1\<^isub>1, ATOM c\<^isub>1\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>1\<^isub>p, ATOM c\<^isub>1\<^isub>p) \<union> \<calL>(\<lambda>(ONE))"}.
\end{equation}
\noindent
@@ -641,8 +641,8 @@
\mbox{\begin{tabular}{rcl}
@{thm (lhs) Init_rhs_def} & @{text "\<equiv>"} &
@{text "if"}~@{term "[] \<in> X"}\\
- & & @{text "then"}~@{term "{Trn Y (CHAR c) | Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X} \<union> {Lam EMPTY}"}\\
- & & @{text "else"}~@{term "{Trn Y (CHAR c)| Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X}"}\\
+ & & @{text "then"}~@{term "{Trn Y (ATOM c) | Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X} \<union> {Lam ONE}"}\\
+ & & @{text "else"}~@{term "{Trn Y (ATOM c)| Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X}"}\\
@{thm (lhs) Init_def} & @{text "\<equiv>"} & @{thm (rhs) Init_def}
\end{tabular}}
\end{equation}
@@ -986,6 +986,7 @@
section {* Myhill-Nerode, Second Part *}
text {*
+ \noindent
We will prove in this section the second part of the Myhill-Nerode
theorem. It can be formulated in our setting as follows:
@@ -999,7 +1000,7 @@
\begin{proof}[Base Cases]
- The cases for @{const NULL}, @{const EMPTY} and @{const CHAR} are routine, because
+ The cases for @{const ZERO}, @{const ONE} and @{const ATOM} are routine, because
we can easily establish that
\begin{center}
@@ -1093,9 +1094,9 @@
Chaining Lem.~\ref{finone} and \ref{fintwo} together, means in order to show
that @{term "UNIV // \<approx>(L r)"} is finite, we have to find a tagging-function whose
range can be shown to be finite and whose tagging-relation refines @{term "\<approx>(L r)"}.
- Let us attempt the @{const ALT}-case first.
+ Let us attempt the @{const PLUS}-case first.
- \begin{proof}[@{const "ALT"}-Case]
+ \begin{proof}[@{const "PLUS"}-Case]
We take as tagging-function
%
\begin{center}
@@ -1106,7 +1107,7 @@
where @{text "A"} and @{text "B"} are some arbitrary languages.
We can show in general, if @{term "finite (UNIV // \<approx>A)"} and @{term "finite (UNIV // \<approx>B)"}
then @{term "finite ((UNIV // \<approx>A) \<times> (UNIV // \<approx>B))"} holds. The range of
- @{term "tag_str_ALT A B"} is a subset of this product set---so finite. It remains to be shown
+ @{term "tag_str_PLUS A B"} is a subset of this product set---so finite. It remains to be shown
that @{text "=tag\<^isub>A\<^isub>L\<^isub>T A B="} refines @{term "\<approx>(A \<union> B)"}. This amounts to
showing
%
@@ -1134,7 +1135,7 @@
\noindent
The pattern in \eqref{pattern} is repeated for the other two cases. Unfortunately,
- they are slightly more complicated. In the @{const SEQ}-case we essentially have
+ they are slightly more complicated. In the @{const TIMES}-case we essentially have
to be able to infer that
%
\begin{center}
@@ -1239,14 +1240,14 @@
with the idea that in the first split we have to make sure that @{text "(x - x') @ z"}
is in the language @{text B}.
- \begin{proof}[@{const SEQ}-Case]
+ \begin{proof}[@{const TIMES}-Case]
If @{term "finite (UNIV // \<approx>A)"} and @{term "finite (UNIV // \<approx>B)"}
then @{term "finite ((UNIV // \<approx>A) \<times> (Pow (UNIV // \<approx>B)))"} holds. The range of
- @{term "tag_str_SEQ A B"} is a subset of this product set, and therefore finite.
+ @{term "tag_str_TIMES A B"} is a subset of this product set, and therefore finite.
We have to show injectivity of this tagging-function as
%
\begin{center}
- @{term "\<forall>z. tag_str_SEQ A B x = tag_str_SEQ A B y \<and> x @ z \<in> A \<cdot> B \<longrightarrow> y @ z \<in> A \<cdot> B"}
+ @{term "\<forall>z. tag_str_TIMES A B x = tag_str_TIMES A B y \<and> x @ z \<in> A \<cdot> B \<longrightarrow> y @ z \<in> A \<cdot> B"}
\end{center}
%
\noindent
@@ -1259,7 +1260,7 @@
\end{center}
%
\noindent
- and by the assumption about @{term "tag_str_SEQ A B"} also
+ and by the assumption about @{term "tag_str_TIMES A B"} also
%
\begin{center}
@{term "(\<approx>B `` {x - x'}) \<in> ({\<approx>B `` {y - y'} |y'. y' \<le> y \<and> y' \<in> A})"}
@@ -1274,15 +1275,15 @@
@{term "y @ z \<in> A \<cdot> B"}, as needed in this case.
Second, there exists a @{text "z'"} such that @{term "x @ z' \<in> A"} and @{text "z - z' \<in> B"}.
- By the assumption about @{term "tag_str_SEQ A B"} we have
+ By the assumption about @{term "tag_str_TIMES A B"} we have
@{term "\<approx>A `` {x} = \<approx>A `` {y}"} and thus @{term "x \<approx>A y"}. Which means by the Myhill-Nerode
relation that @{term "y @ z' \<in> A"} holds. Using @{text "z - z' \<in> B"}, we can conclude also in this case
- with @{term "y @ z \<in> A \<cdot> B"}. We again can complete the @{const SEQ}-case
+ with @{term "y @ z \<in> A \<cdot> B"}. We again can complete the @{const TIMES}-case
by setting @{text A} to @{term "L r\<^isub>1"} and @{text B} to @{term "L r\<^isub>2"}.\qed
\end{proof}
\noindent
- The case for @{const STAR} is similar to @{const SEQ}, but poses a few extra challenges. When
+ The case for @{const STAR} is similar to @{const TIMES}, but poses a few extra challenges. When
we analyse the case that @{text "x @ z"} is an element in @{term "A\<star>"} and @{text x} is not the
empty string, we
have the following picture:
@@ -1390,8 +1391,14 @@
section {* Second Part based on Partial Derivatives *}
text {*
- We briefly considered using the method Brzozowski presented in the
- Appendix of~\cite{Brzozowski64} in order to prove the second
+ \noindent
+ As we have seen in the previous section, in order to establish
+ the second direction of the Myhill-Nerode theorem, we need to find
+ a more refined relation (more refined than ??) for which we can
+ show that there are only finitely many equivalence classes.
+ Brzozowski presented in the Appendix of~\cite{Brzozowski64}
+
+ in order to prove the second
direction of the Myhill-Nerode theorem. There he calculates the
derivatives for regular expressions and shows that for every
language there can be only finitely many of them %derivations (if
@@ -1404,6 +1411,9 @@
of derivatives modulo ACI. Therefore we preferred our direct method
of using tagging-functions.
+ The problem of finiteness modulo ACI can be avoided by using partial
+ derivatives introduced by Antimirov \cite{Antimirov}.
+
*}
section {* Closure Properties *}
@@ -1501,11 +1511,12 @@
establishing the Myhill-Nerode theorem. All standard proofs of this
direction proceed by arguments over automata.\medskip
- We expect that the development of Krauss \& Nipkoe gets easier by
- using partial derivatives.
+ We expect that the development of Krauss \& Nipkow gets easier by
+ using partial derivatives.\medskip
\noindent
- {\bf Acknowledgements:} We are grateful for the comments we received from Larry
+ {\bf Acknowledgements:}
+ We are grateful for the comments we received from Larry
Paulson.
*}