7 |
7 |
8 consts |
8 consts |
9 REL :: "(string \<times> string) \<Rightarrow> bool" |
9 REL :: "(string \<times> string) \<Rightarrow> bool" |
10 UPLUS :: "'a set \<Rightarrow> 'a set \<Rightarrow> (nat \<times> 'a) set" |
10 UPLUS :: "'a set \<Rightarrow> 'a set \<Rightarrow> (nat \<times> 'a) set" |
11 |
11 |
|
12 abbreviation |
|
13 "EClass x R \<equiv> R `` {x}" |
12 |
14 |
13 notation (latex output) |
15 notation (latex output) |
14 str_eq_rel ("\<approx>\<^bsub>_\<^esub>") and |
16 str_eq_rel ("\<approx>\<^bsub>_\<^esub>") and |
15 Seq (infixr "\<cdot>" 100) and |
17 Seq (infixr "\<cdot>" 100) and |
16 Star ("_\<^bsup>\<star>\<^esup>") and |
18 Star ("_\<^bsup>\<star>\<^esup>") and |
17 pow ("_\<^bsup>_\<^esup>" [100, 100] 100) and |
19 pow ("_\<^bsup>_\<^esup>" [100, 100] 100) and |
18 Suc ("_+1" [100] 100) and |
20 Suc ("_+1" [100] 100) and |
19 quotient ("_ \<^raw:\ensuremath{\!\sslash\!}> _" [90, 90] 90) and |
21 quotient ("_ \<^raw:\ensuremath{\!\sslash\!}> _" [90, 90] 90) and |
20 REL ("\<approx>") and |
22 REL ("\<approx>") and |
21 UPLUS ("_ \<^raw:\ensuremath{\uplus}> _" [90, 90] 90) and |
23 UPLUS ("_ \<^raw:\ensuremath{\uplus}> _" [90, 90] 90) and |
22 L ("L '(_')" [0] 101) |
24 L ("L '(_')" [0] 101) and |
|
25 EClass ("\<lbrakk>_\<rbrakk>\<^bsub>_\<^esub>" [100, 100] 100) |
23 (*>*) |
26 (*>*) |
|
27 |
24 |
28 |
25 section {* Introduction *} |
29 section {* Introduction *} |
26 |
30 |
27 text {* |
31 text {* |
28 Regular languages are an important and well-understood subject in Computer |
32 Regular languages are an important and well-understood subject in Computer |
39 benefits. Among them is that it is easy to convince oneself from the fact that |
43 benefits. Among them is that it is easy to convince oneself from the fact that |
40 regular languages are closed under complementation: one just has to exchange |
44 regular languages are closed under complementation: one just has to exchange |
41 the accepting and non-accepting states in the corresponding automaton to |
45 the accepting and non-accepting states in the corresponding automaton to |
42 obtain an automaton for the complement language. The problem, however, lies with |
46 obtain an automaton for the complement language. The problem, however, lies with |
43 formalising such reasoning in a HOL-based theorem prover, in our case |
47 formalising such reasoning in a HOL-based theorem prover, in our case |
44 Isabelle/HOL. Automata consist of states and transitions. They need to be represented |
48 Isabelle/HOL. Automata are build up from states and transitions that |
45 as graphs or matrices, neither |
49 need to be represented as graphs or matrices, neither |
46 of which can be defined as inductive datatype.\footnote{In some works |
50 of which can be defined as inductive datatype.\footnote{In some works |
47 functions are used to represent state transitions, but also they are not |
51 functions are used to represent state transitions, but also they are not |
48 inductive datatypes.} This means we have to build our own reasoning |
52 inductive datatypes.} This means we have to build our own reasoning |
49 infrastructure for them, as neither Isabelle/HOL nor HOL4 nor HOLlight support |
53 infrastructure for them, as neither Isabelle/HOL nor HOL4 nor HOLlight support |
50 them with libraries. |
54 them with libraries. |
123 Using this definition for disjoint unions means we do not have a single type for automata |
127 Using this definition for disjoint unions means we do not have a single type for automata |
124 and hence will not be able to state properties about \emph{all} |
128 and hence will not be able to state properties about \emph{all} |
125 automata, since there is no type quantification available in HOL. An |
129 automata, since there is no type quantification available in HOL. An |
126 alternative, which provides us with a single type for automata, is to give every |
130 alternative, which provides us with a single type for automata, is to give every |
127 state node an identity, for example a natural |
131 state node an identity, for example a natural |
128 number, and then be careful renaming these identities apart whenever |
132 number, and then be careful to rename these identities apart whenever |
129 connecting two automata. This results in clunky proofs |
133 connecting two automata. This results in clunky proofs |
130 establishing that properties are invariant under renaming. Similarly, |
134 establishing that properties are invariant under renaming. Similarly, |
131 connecting two automata represented as matrices results in very adhoc |
135 connecting two automata represented as matrices results in very adhoc |
132 constructions, which are not pleasant to reason about. |
136 constructions, which are not pleasant to reason about. |
133 |
137 |
134 Because of these problems to do with representing automata, there seems |
138 Because of these problems to do with representing automata, there seems |
135 to be no substantial formalisation of automata theory and regular languages |
139 to be no substantial formalisation of automata theory and regular languages |
136 carried out in a HOL-based theorem prover. We are only aware of the |
140 carried out in a HOL-based theorem prover. We are only aware of the |
137 large formalisation of the automata theory in Nuprl \cite{Constable00} and |
141 large formalisation of automata theory in Nuprl \cite{Constable00} and |
138 some smaller formalisations in Coq, for example \cite{Filliatre97}. |
142 some smaller formalisations in Coq, for example \cite{Filliatre97}. |
139 |
143 |
140 In this paper, we will not attempt to formalise automata theory, but take a completely |
144 In this paper, we will not attempt to formalise automata theory, but take a completely |
141 different approach to regular languages. Instead of defining a regular language as one |
145 different approach to regular languages. Instead of defining a regular language as one |
142 where there exists an automaton that recognises all strings of the language, we define |
146 where there exists an automaton that recognises all strings of the language, we define |
148 \end{definition} |
152 \end{definition} |
149 |
153 |
150 \noindent |
154 \noindent |
151 The reason is that regular expressions, unlike graphs and matrices, can |
155 The reason is that regular expressions, unlike graphs and matrices, can |
152 be easily defined as inductive datatype. Therefore a corresponding reasoning |
156 be easily defined as inductive datatype. Therefore a corresponding reasoning |
153 infrastructure comes for free. This has recently been used for formalising regular |
157 infrastructure comes for free. This has recently been used in HOL4 for formalising regular |
154 expression matching in HOL4 \cite{OwensSlind08}. The purpose of this paper is to |
158 expression matching based on derivatives \cite{OwensSlind08}. The purpose of this paper is to |
155 show that a central result about regular languages, the Myhill-Nerode theorem, |
159 show that a central result about regular languages, the Myhill-Nerode theorem, |
156 can be recreated by only using regular expressions. This theorem give a necessary |
160 can be recreated by only using regular expressions. This theorem gives a necessary |
157 and sufficient condition for when a language is regular. As a corollary of this |
161 and sufficient condition for when a language is regular. As a corollary of this |
158 theorem we can easily establish the usual closure properties, including |
162 theorem we can easily establish the usual closure properties, including |
159 complementation, for regular languages.\smallskip |
163 complementation, for regular languages.\smallskip |
160 |
164 |
161 \noindent |
165 \noindent |
259 @{thm (lhs) L_rexp.simps(6)[where r="r"]} & @{text "\<equiv>"} & |
263 @{thm (lhs) L_rexp.simps(6)[where r="r"]} & @{text "\<equiv>"} & |
260 @{thm (rhs) L_rexp.simps(6)[where r="r"]}\\ |
264 @{thm (rhs) L_rexp.simps(6)[where r="r"]}\\ |
261 \end{tabular} |
265 \end{tabular} |
262 \end{tabular} |
266 \end{tabular} |
263 \end{center} |
267 \end{center} |
|
268 |
264 *} |
269 *} |
265 |
270 |
266 section {* Finite Partitions Imply Regularity of a Language *} |
271 section {* Finite Partitions Imply Regularity of a Language *} |
267 |
272 |
268 text {* |
273 text {* |
|
274 \begin{definition}[Myhill-Nerode Relation]\mbox{}\\ |
|
275 @{thm str_eq_rel_def[simplified]} |
|
276 \end{definition} |
|
277 |
|
278 \begin{definition} @{text "finals A"} are the equivalence classes that contain |
|
279 strings from @{text A}\\ |
|
280 @{thm finals_def} |
|
281 \end{definition} |
|
282 |
|
283 @{thm lang_is_union_of_finals} |
|
284 |
|
285 |
269 \begin{theorem} |
286 \begin{theorem} |
270 Given a language @{text A}. |
287 Given a language @{text A}. |
271 @{thm[mode=IfThen] hard_direction[where Lang="A"]} |
288 @{thm[mode=IfThen] hard_direction} |
272 \end{theorem} |
289 \end{theorem} |
273 *} |
290 *} |
274 |
291 |
275 section {* Regular Expressions Generate Finitely Many Partitions *} |
292 section {* Regular Expressions Generate Finitely Many Partitions *} |
276 |
293 |