# HG changeset patch # User urbanc # Date 1296726859 0 # Node ID 070f543e2560393e79c481411a951050c82e4c47 # Parent fb08f41ca33d4bfd7ca5a17364cea9393fbc72d0 more to the intro diff -r fb08f41ca33d -r 070f543e2560 Paper/Paper.thy --- a/Paper/Paper.thy Thu Feb 03 05:38:47 2011 +0000 +++ b/Paper/Paper.thy Thu Feb 03 09:54:19 2011 +0000 @@ -26,29 +26,29 @@ Regular languages are an important and well-understood subject in Computer Science, with many beautiful theorems and many useful algorithms. There is a wide range of textbooks on this subject, many of which are aimed at - students containing very detailed ``pencil-and-paper'' proofs + students and contain very detailed ``pencil-and-paper'' proofs (e.g.~\cite{Kozen97}). It seems natural to exercise theorem provers by formalising these theorems and by verifying formally the algorithms. There is however a problem with this: the typical approach to regular - languages is to introduce automata and then define everything in terms of + languages is to introduce finite automata and then define everything in terms of them. For example, a regular language is normally defined as one where there is a finite deterministic automaton that recognises all the strings of - the language. One benefit of this approach is that it is easy to convince + the language. This approach has many benefits. One is that it is easy to convince oneself from the fact that regular languages are closed under complementation: one just has to exchange the accepting and non-accepting - states in the automaton to obtain an automaton for the complement language. - + states in the corresponding automaton to obtain an automaton for the complement language. The problem lies with formalising such reasoning in a theorem prover, in our case Isabelle/HOL. Automata need to be represented as graphs or matrices, neither of which can be defined as inductive datatype.\footnote{In some works functions are used to represent transitions, but they are also not inductive datatypes.} This means we have to build our own reasoning infrastructure - for them, as neither Isabelle nor HOL4 support them with libraries. - Even worse, reasoning about graphs in typed languages can be a real hassle - making proofs quite clunky. Consider for example the operation of combining + for them, as neither Isabelle nor HOL4 nor HOLlight support them with libraries. + + Even worse, reasoning about graphs in typed languages can be a real hassle. + Consider for example the operation of combining two automata into a new automaton by connecting their - initial states to a new initial state (similarly with the accepting states):a + initial states to a new initial state (similarly with the accepting states): \begin{center} picture @@ -57,18 +57,31 @@ \noindent How should we implement this operation? On paper we can just form the disjoint union of the state nodes and add two more nodes---one for the - new initial state, the other for the new accepting state. + new initial state, the other for the new accepting state. In a theorem + prover based on set-theory, this operaton can be more or less + straightforwardly implemented. But in a HOL-based theorem prover the + standard definition of disjoint unions as pairs + \begin{center} + definition + \end{center} - - + \noindent + changes the type (from sets of nodes to sets of pairs). This means we + cannot formulate in this represeantation any property about \emph{all} + automata---since there is no type quantification available in HOL-based + theorem provers. A working alternative is to give every state node an + identity, for example a natural number, and then be careful to rename + these indentities appropriately when connecting two automata together. + This results in very clunky side-proofs establishing that properties + are invariant under renaming. We are only aware of the formalisation + of automata theory in Nuprl that carries this approach trough and is + quite substantial. - - - - - Therefore instead of defining a regular language as being one where there exists an - automata that recognises all of its strings, we define + We will take a completely different approach to formalising theorems + about regular languages. Instead of defining a regular language as one + where there exists an automaton that recognises all of its strings, we + define a regular language as \begin{definition}[A Regular Language] A language @{text A} is regular, if there is a regular expression that matches all @@ -76,6 +89,16 @@ \end{definition} \noindent + The reason is that regular expressinons, unlike graphs and metrices, can + be eaily defined as inductive datatype and this means a reasoning infrastructre + comes for them in Isabelle for free. The purpose of this paper is to + show that a central and highly non-trivisl result about regular languages, + namely the Myhill-Nerode theorem, can be recreated only using regular + expressions. In our approach we do not need to formalise graps or + metrices. + + + \noindent {\bf Contributions:} A proof of the Myhill-Nerode Theorem based on regular expressions. The finiteness part of this theorem is proved using tagging-functions (which to our knowledge are novel in this context). diff -r fb08f41ca33d -r 070f543e2560 Paper/document/root.tex --- a/Paper/document/root.tex Thu Feb 03 05:38:47 2011 +0000 +++ b/Paper/document/root.tex Thu Feb 03 09:54:19 2011 +0000 @@ -38,7 +38,7 @@ theorem provers. The reason is that they need to be represented as graphs or matrices, neither of which can be defined as inductive datatype. Also operations, such as disjoint unions of graphs, are not easily formalisiable -in HOL. In contrast, regular expressions can be defined easily +in HOL. In contrast, regular expressions can be defined conveniently as datatype and a corresponding reasoning infrastructure comes for free. We show in this paper that a central result from formal language theory---the Myhill-Nerode theorem---can be recreated diff -r fb08f41ca33d -r 070f543e2560 tphols-2011/myhill.pdf Binary file tphols-2011/myhill.pdf has changed