regexp: comparison Paper/Paper.thy

equal deleted inserted replaced

-:fb08f41ca33d
+:070f543e2560
 text {*
 Regular languages are an important and well-understood subject in Computer
 Science, with many beautiful theorems and many useful algorithms. There is a
 wide range of textbooks on this subject, many of which are aimed at
-students containing very detailed ``pencil-and-paper'' proofs
+students and contain very detailed ``pencil-and-paper'' proofs
 (e.g.~\cite{Kozen97}). It seems natural to exercise theorem provers by
 formalising these theorems and by verifying formally the algorithms.
 There is however a problem with this: the typical approach to regular
-languages is to introduce automata and then define everything in terms of
+languages is to introduce finite automata and then define everything in terms of
 them.  For example, a regular language is normally defined as one where
 there is a finite deterministic automaton that recognises all the strings of
-the language. One benefit of this approach is that it is easy to convince
+the language. This approach has many benefits. One is that it is easy to convince
 oneself from the fact that regular languages are closed under
 complementation: one just has to exchange the accepting and non-accepting
-states in the automaton to obtain an automaton for the complement language.
+states in the corresponding automaton to obtain an automaton for the complement language.
 The problem lies with formalising such reasoning in a theorem
 prover, in our case Isabelle/HOL. Automata need to be represented as graphs
 or matrices, neither of which can be defined as inductive datatype.\footnote{In
 some works functions are used to represent transitions, but they are also not
 inductive datatypes.} This means we have to build our own reasoning infrastructure
-for them, as neither Isabelle nor HOL4 support them with libraries.
+for them, as neither Isabelle nor HOL4 nor HOLlight support them with libraries.
-Even worse, reasoning about graphs in typed languages can be a real hassle
-making proofs quite clunky. Consider for example the operation of combining
+Even worse, reasoning about graphs in typed languages can be a real hassle.
+Consider for example the operation of combining
 two automata into a new automaton by connecting their
-initial states to a new initial state (similarly with the accepting states):a
+initial states to a new initial state (similarly with the accepting states):
 \begin{center}
 picture
 \end{center}
 \noindent
 How should we implement this operation? On paper we can just
 form the disjoint union of the state nodes and add two more nodes---one for the
-new initial state, the other for the new accepting state.
+new initial state, the other for the new accepting state. In a theorem
+prover based on set-theory, this operaton can be more or less
+straightforwardly implemented. But in a HOL-based theorem prover the
+standard definition of disjoint unions as pairs
+\begin{center}
+definition
+\end{center}
+\noindent
-Therefore instead of defining a regular language as being one where there exists an
+changes the type (from sets of nodes to sets of pairs). This means we
-automata that recognises all of its strings, we define
+cannot formulate in this represeantation any property about \emph{all}
+automata---since there is no type quantification available in HOL-based
+theorem provers. A working alternative is to give every state node an
+identity, for example a natural number, and then be careful to rename
+these indentities appropriately when connecting two automata together.
+This results in very clunky side-proofs establishing that properties
+are invariant under renaming. We are only aware of the formalisation
+of automata theory in Nuprl that carries this approach trough and is
+quite substantial.
+We will take a completely different approach to formalising theorems
+about regular languages. Instead of defining a regular language as one
+where there exists an automaton that recognises all of its strings, we
+define a regular language as
 \begin{definition}[A Regular Language]
 A language @{text A} is regular, if there is a regular expression that matches all
 strings of @{text "A"}.
 \end{definition}
+\noindent
+The reason is that regular expressinons, unlike graphs and metrices, can
+be eaily defined as inductive datatype and this means a reasoning infrastructre
+comes for them in Isabelle for free. The purpose of this paper is to
+show that a central and highly non-trivisl result about regular languages,
+namely the Myhill-Nerode theorem,  can be recreated only using regular
+expressions. In our approach we do not need to formalise graps or
+metrices.
 \noindent
 {\bf Contributions:} A proof of the Myhill-Nerode Theorem based on regular expressions. The
 finiteness part of this theorem is proved using tagging-functions (which to our knowledge
 are novel in this context).

changeset 61	070f543e2560
parent 60	fb08f41ca33d
child 66	828ea293b61f