diff -r fb08f41ca33d -r 070f543e2560 Paper/Paper.thy --- a/Paper/Paper.thy Thu Feb 03 05:38:47 2011 +0000 +++ b/Paper/Paper.thy Thu Feb 03 09:54:19 2011 +0000 @@ -26,29 +26,29 @@ Regular languages are an important and well-understood subject in Computer Science, with many beautiful theorems and many useful algorithms. There is a wide range of textbooks on this subject, many of which are aimed at - students containing very detailed ``pencil-and-paper'' proofs + students and contain very detailed ``pencil-and-paper'' proofs (e.g.~\cite{Kozen97}). It seems natural to exercise theorem provers by formalising these theorems and by verifying formally the algorithms. There is however a problem with this: the typical approach to regular - languages is to introduce automata and then define everything in terms of + languages is to introduce finite automata and then define everything in terms of them. For example, a regular language is normally defined as one where there is a finite deterministic automaton that recognises all the strings of - the language. One benefit of this approach is that it is easy to convince + the language. This approach has many benefits. One is that it is easy to convince oneself from the fact that regular languages are closed under complementation: one just has to exchange the accepting and non-accepting - states in the automaton to obtain an automaton for the complement language. - + states in the corresponding automaton to obtain an automaton for the complement language. The problem lies with formalising such reasoning in a theorem prover, in our case Isabelle/HOL. Automata need to be represented as graphs or matrices, neither of which can be defined as inductive datatype.\footnote{In some works functions are used to represent transitions, but they are also not inductive datatypes.} This means we have to build our own reasoning infrastructure - for them, as neither Isabelle nor HOL4 support them with libraries. - Even worse, reasoning about graphs in typed languages can be a real hassle - making proofs quite clunky. Consider for example the operation of combining + for them, as neither Isabelle nor HOL4 nor HOLlight support them with libraries. + + Even worse, reasoning about graphs in typed languages can be a real hassle. + Consider for example the operation of combining two automata into a new automaton by connecting their - initial states to a new initial state (similarly with the accepting states):a + initial states to a new initial state (similarly with the accepting states): \begin{center} picture @@ -57,18 +57,31 @@ \noindent How should we implement this operation? On paper we can just form the disjoint union of the state nodes and add two more nodes---one for the - new initial state, the other for the new accepting state. + new initial state, the other for the new accepting state. In a theorem + prover based on set-theory, this operaton can be more or less + straightforwardly implemented. But in a HOL-based theorem prover the + standard definition of disjoint unions as pairs + \begin{center} + definition + \end{center} - - + \noindent + changes the type (from sets of nodes to sets of pairs). This means we + cannot formulate in this represeantation any property about \emph{all} + automata---since there is no type quantification available in HOL-based + theorem provers. A working alternative is to give every state node an + identity, for example a natural number, and then be careful to rename + these indentities appropriately when connecting two automata together. + This results in very clunky side-proofs establishing that properties + are invariant under renaming. We are only aware of the formalisation + of automata theory in Nuprl that carries this approach trough and is + quite substantial. - - - - - Therefore instead of defining a regular language as being one where there exists an - automata that recognises all of its strings, we define + We will take a completely different approach to formalising theorems + about regular languages. Instead of defining a regular language as one + where there exists an automaton that recognises all of its strings, we + define a regular language as \begin{definition}[A Regular Language] A language @{text A} is regular, if there is a regular expression that matches all @@ -76,6 +89,16 @@ \end{definition} \noindent + The reason is that regular expressinons, unlike graphs and metrices, can + be eaily defined as inductive datatype and this means a reasoning infrastructre + comes for them in Isabelle for free. The purpose of this paper is to + show that a central and highly non-trivisl result about regular languages, + namely the Myhill-Nerode theorem, can be recreated only using regular + expressions. In our approach we do not need to formalise graps or + metrices. + + + \noindent {\bf Contributions:} A proof of the Myhill-Nerode Theorem based on regular expressions. The finiteness part of this theorem is proved using tagging-functions (which to our knowledge are novel in this context).