--- a/Paper/Paper.thy Thu Feb 03 05:38:47 2011 +0000
+++ b/Paper/Paper.thy Thu Feb 03 09:54:19 2011 +0000
@@ -26,29 +26,29 @@
Regular languages are an important and well-understood subject in Computer
Science, with many beautiful theorems and many useful algorithms. There is a
wide range of textbooks on this subject, many of which are aimed at
- students containing very detailed ``pencil-and-paper'' proofs
+ students and contain very detailed ``pencil-and-paper'' proofs
(e.g.~\cite{Kozen97}). It seems natural to exercise theorem provers by
formalising these theorems and by verifying formally the algorithms.
There is however a problem with this: the typical approach to regular
- languages is to introduce automata and then define everything in terms of
+ languages is to introduce finite automata and then define everything in terms of
them. For example, a regular language is normally defined as one where
there is a finite deterministic automaton that recognises all the strings of
- the language. One benefit of this approach is that it is easy to convince
+ the language. This approach has many benefits. One is that it is easy to convince
oneself from the fact that regular languages are closed under
complementation: one just has to exchange the accepting and non-accepting
- states in the automaton to obtain an automaton for the complement language.
-
+ states in the corresponding automaton to obtain an automaton for the complement language.
The problem lies with formalising such reasoning in a theorem
prover, in our case Isabelle/HOL. Automata need to be represented as graphs
or matrices, neither of which can be defined as inductive datatype.\footnote{In
some works functions are used to represent transitions, but they are also not
inductive datatypes.} This means we have to build our own reasoning infrastructure
- for them, as neither Isabelle nor HOL4 support them with libraries.
- Even worse, reasoning about graphs in typed languages can be a real hassle
- making proofs quite clunky. Consider for example the operation of combining
+ for them, as neither Isabelle nor HOL4 nor HOLlight support them with libraries.
+
+ Even worse, reasoning about graphs in typed languages can be a real hassle.
+ Consider for example the operation of combining
two automata into a new automaton by connecting their
- initial states to a new initial state (similarly with the accepting states):a
+ initial states to a new initial state (similarly with the accepting states):
\begin{center}
picture
@@ -57,18 +57,31 @@
\noindent
How should we implement this operation? On paper we can just
form the disjoint union of the state nodes and add two more nodes---one for the
- new initial state, the other for the new accepting state.
+ new initial state, the other for the new accepting state. In a theorem
+ prover based on set-theory, this operaton can be more or less
+ straightforwardly implemented. But in a HOL-based theorem prover the
+ standard definition of disjoint unions as pairs
+ \begin{center}
+ definition
+ \end{center}
-
-
+ \noindent
+ changes the type (from sets of nodes to sets of pairs). This means we
+ cannot formulate in this represeantation any property about \emph{all}
+ automata---since there is no type quantification available in HOL-based
+ theorem provers. A working alternative is to give every state node an
+ identity, for example a natural number, and then be careful to rename
+ these indentities appropriately when connecting two automata together.
+ This results in very clunky side-proofs establishing that properties
+ are invariant under renaming. We are only aware of the formalisation
+ of automata theory in Nuprl that carries this approach trough and is
+ quite substantial.
-
-
-
-
- Therefore instead of defining a regular language as being one where there exists an
- automata that recognises all of its strings, we define
+ We will take a completely different approach to formalising theorems
+ about regular languages. Instead of defining a regular language as one
+ where there exists an automaton that recognises all of its strings, we
+ define a regular language as
\begin{definition}[A Regular Language]
A language @{text A} is regular, if there is a regular expression that matches all
@@ -76,6 +89,16 @@
\end{definition}
\noindent
+ The reason is that regular expressinons, unlike graphs and metrices, can
+ be eaily defined as inductive datatype and this means a reasoning infrastructre
+ comes for them in Isabelle for free. The purpose of this paper is to
+ show that a central and highly non-trivisl result about regular languages,
+ namely the Myhill-Nerode theorem, can be recreated only using regular
+ expressions. In our approach we do not need to formalise graps or
+ metrices.
+
+
+ \noindent
{\bf Contributions:} A proof of the Myhill-Nerode Theorem based on regular expressions. The
finiteness part of this theorem is proved using tagging-functions (which to our knowledge
are novel in this context).