22 |
22 |
23 section {* Introduction *} |
23 section {* Introduction *} |
24 |
24 |
25 text {* |
25 text {* |
26 Regular languages are an important and well-understood subject in Computer |
26 Regular languages are an important and well-understood subject in Computer |
27 Science with many beautiful theorems and many useful algorithms. There is a |
27 Science, with many beautiful theorems and many useful algorithms. There is a |
28 wide range of textbooks about this subject. Many of which, such as |
28 wide range of textbooks on this subject, many of which are aimed at |
29 \cite{Kozen97}, are aimed at students and contain very detailed |
29 students containing very detailed ``pencil-and-paper'' proofs |
30 ``pencil-and-paper'' proofs. It seems natural to exercise theorem provers by |
30 (e.g.~\cite{Kozen97}). It seems natural to exercise theorem provers by |
31 formalising these theorems and by verifying formally the algorithms. |
31 formalising these theorems and by verifying formally the algorithms. |
32 |
32 |
33 There is however a problem: the typical approach to the subject of regular |
33 There is however a problem with this: the typical approach to regular |
34 languages is to introduce finite automata and define everything in terms of |
34 languages is to introduce automata and then define everything in terms of |
35 automata. For example, a regular language is nearly always defined as one |
35 them. For example, a regular language is normally defined as one where |
36 where there is a finite deterministic automata that recognises all the |
36 there is a finite deterministic automaton that recognises all the strings of |
37 strings of the language. One benefit of this approach is that it is easy |
37 the language. One benefit of this approach is that it is easy to convince |
38 to prove on paper that regular languages are closed under complementation: |
38 oneself from the fact that regular languages are closed under |
39 one just has to exchange the accepting and non-accepting states in the corresponding |
39 complementation: one just has to exchange the accepting and non-accepting |
40 automata into non-accepting. The problem with automata is that they need |
40 states in the automaton to obtain an automaton for the complement language. |
41 to be represented as graphs (nodes represent states and edges, transitions) |
41 |
42 or as matrices (recording the transitions between states). Both representations |
42 The problem lies with formalising such reasoning in a theorem |
43 cannot be defined in terms of inductive datatypes, and a reasoning infrastructure |
43 prover, in our case Isabelle/HOL. Automata need to be represented as graphs |
44 must be established manually. To our knowledge neither |
44 or matrices, neither of which can be defined as inductive datatype.\footnote{In |
45 Isabelle nor HOL4 nor Coq have a readily usable reasoning library for graphs |
45 some works functions are used to represent transitions, but they are also not |
46 and matrices. Moreover, the reasoning about graphs can be quite cumbersome. |
46 inductive datatypes.} This means we have to build our own reasoning infrastructure |
47 Cosider for example the frequently used operation of combinding two automata |
47 for them, as neither Isabelle nor HOL4 support them with libraries. |
48 into a new compound automaton. While on paper, this can be done by just forming |
48 Even worse, reasoning about graphs in typed languages can be a real hassle |
49 the disjoint union of the nodes, this does not work in typed systems \dots |
49 making proofs quite clunky. Consider for example the operation of combining |
|
50 two automata into a new automaton by connecting their |
|
51 initial states to a new initial state (similarly with the accepting states):a |
|
52 |
|
53 \begin{center} |
|
54 picture |
|
55 \end{center} |
|
56 |
|
57 \noindent |
|
58 How should we implement this operation? On paper we can just |
|
59 form the disjoint union of the state nodes and add two more nodes---one for the |
|
60 new initial state, the other for the new accepting state. |
|
61 |
|
62 |
50 |
63 |
51 |
64 |
52 |
65 |
53 |
66 |
54 |
67 |
55 |
68 |
56 |
69 |
57 Therefore instead of defining a regular language as being one where there exists an |
70 Therefore instead of defining a regular language as being one where there exists an |
58 automata that regognises all of its strings, we define |
71 automata that recognises all of its strings, we define |
59 |
72 |
60 \begin{definition}[A Regular Language] |
73 \begin{definition}[A Regular Language] |
61 A language @{text A} is regular, if there is a regular expression that matches all |
74 A language @{text A} is regular, if there is a regular expression that matches all |
62 strings of @{text "A"}. |
75 strings of @{text "A"}. |
63 \end{definition} |
76 \end{definition} |
64 |
77 |
65 \noindent |
78 \noindent |
66 {\bf Contributions:} A proof of the Myhil-Nerode Theorem based on regular expressions. The |
79 {\bf Contributions:} A proof of the Myhill-Nerode Theorem based on regular expressions. The |
67 finiteness part of this theorem is proved using tagging-functions (which to our knowledge |
80 finiteness part of this theorem is proved using tagging-functions (which to our knowledge |
68 are novel in this context). |
81 are novel in this context). |
69 |
82 |
70 *} |
83 *} |
71 |
84 |