--- a/Paper/Paper.thy Thu Feb 03 05:38:47 2011 +0000
+++ b/Paper/Paper.thy Thu Feb 03 09:54:19 2011 +0000
@@ -26,29 +26,29 @@
Regular languages are an important and well-understood subject in Computer
Science, with many beautiful theorems and many useful algorithms. There is a
wide range of textbooks on this subject, many of which are aimed at
- students containing very detailed ``pencil-and-paper'' proofs
+ students and contain very detailed ``pencil-and-paper'' proofs
(e.g.~\cite{Kozen97}). It seems natural to exercise theorem provers by
formalising these theorems and by verifying formally the algorithms.
There is however a problem with this: the typical approach to regular
- languages is to introduce automata and then define everything in terms of
+ languages is to introduce finite automata and then define everything in terms of
them. For example, a regular language is normally defined as one where
there is a finite deterministic automaton that recognises all the strings of
- the language. One benefit of this approach is that it is easy to convince
+ the language. This approach has many benefits. One is that it is easy to convince
oneself from the fact that regular languages are closed under
complementation: one just has to exchange the accepting and non-accepting
- states in the automaton to obtain an automaton for the complement language.
-
+ states in the corresponding automaton to obtain an automaton for the complement language.
The problem lies with formalising such reasoning in a theorem
prover, in our case Isabelle/HOL. Automata need to be represented as graphs
or matrices, neither of which can be defined as inductive datatype.\footnote{In
some works functions are used to represent transitions, but they are also not
inductive datatypes.} This means we have to build our own reasoning infrastructure
- for them, as neither Isabelle nor HOL4 support them with libraries.
- Even worse, reasoning about graphs in typed languages can be a real hassle
- making proofs quite clunky. Consider for example the operation of combining
+ for them, as neither Isabelle nor HOL4 nor HOLlight support them with libraries.
+
+ Even worse, reasoning about graphs in typed languages can be a real hassle.
+ Consider for example the operation of combining
two automata into a new automaton by connecting their
- initial states to a new initial state (similarly with the accepting states):a
+ initial states to a new initial state (similarly with the accepting states):
\begin{center}
picture
@@ -57,18 +57,31 @@
\noindent
How should we implement this operation? On paper we can just
form the disjoint union of the state nodes and add two more nodes---one for the
- new initial state, the other for the new accepting state.
+ new initial state, the other for the new accepting state. In a theorem
+ prover based on set-theory, this operaton can be more or less
+ straightforwardly implemented. But in a HOL-based theorem prover the
+ standard definition of disjoint unions as pairs
+ \begin{center}
+ definition
+ \end{center}
-
-
+ \noindent
+ changes the type (from sets of nodes to sets of pairs). This means we
+ cannot formulate in this represeantation any property about \emph{all}
+ automata---since there is no type quantification available in HOL-based
+ theorem provers. A working alternative is to give every state node an
+ identity, for example a natural number, and then be careful to rename
+ these indentities appropriately when connecting two automata together.
+ This results in very clunky side-proofs establishing that properties
+ are invariant under renaming. We are only aware of the formalisation
+ of automata theory in Nuprl that carries this approach trough and is
+ quite substantial.
-
-
-
-
- Therefore instead of defining a regular language as being one where there exists an
- automata that recognises all of its strings, we define
+ We will take a completely different approach to formalising theorems
+ about regular languages. Instead of defining a regular language as one
+ where there exists an automaton that recognises all of its strings, we
+ define a regular language as
\begin{definition}[A Regular Language]
A language @{text A} is regular, if there is a regular expression that matches all
@@ -76,6 +89,16 @@
\end{definition}
\noindent
+ The reason is that regular expressinons, unlike graphs and metrices, can
+ be eaily defined as inductive datatype and this means a reasoning infrastructre
+ comes for them in Isabelle for free. The purpose of this paper is to
+ show that a central and highly non-trivisl result about regular languages,
+ namely the Myhill-Nerode theorem, can be recreated only using regular
+ expressions. In our approach we do not need to formalise graps or
+ metrices.
+
+
+ \noindent
{\bf Contributions:} A proof of the Myhill-Nerode Theorem based on regular expressions. The
finiteness part of this theorem is proved using tagging-functions (which to our knowledge
are novel in this context).
--- a/Paper/document/root.tex Thu Feb 03 05:38:47 2011 +0000
+++ b/Paper/document/root.tex Thu Feb 03 09:54:19 2011 +0000
@@ -38,7 +38,7 @@
theorem provers. The reason is that they need to be represented as graphs
or matrices, neither of which can be defined as inductive datatype. Also
operations, such as disjoint unions of graphs, are not easily formalisiable
-in HOL. In contrast, regular expressions can be defined easily
+in HOL. In contrast, regular expressions can be defined conveniently
as datatype and a corresponding reasoning infrastructure comes for
free. We show in this paper that a central result from formal
language theory---the Myhill-Nerode theorem---can be recreated
Binary file tphols-2011/myhill.pdf has changed