regexp: Paper/Paper.thy@fc35eb54fdc9 (annotated)

24 f72c82bf59e5 added paper urbanc parents: diff changeset	1	(<)
f72c82bf59e5 added paper urbanc parents: diff changeset	2	theory Paper
39 a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	3	imports "../Myhill" "LaTeXsugar"
24 f72c82bf59e5 added paper urbanc parents: diff changeset	4	begin
39 a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	5
a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	6	declare [[show_question_marks = false]]
a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	7
54 c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	8	consts
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	9	REL :: "(string \<times> string) \<Rightarrow> bool"
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	10
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	11
39 a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	12	notation (latex output)
50 32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	13	str_eq_rel ("\<approx>\<^bsub>_\<^esub>") and
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	14	Seq (infixr "\<cdot>" 100) and
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	15	Star ("_\<^bsup>\<star>\<^esup>") and
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	16	pow ("_\<^bsup>_\<^esup>" [100, 100] 100) and
58 0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	17	Suc ("_+1" [100] 100) and
54 c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	18	quotient ("_ \<^raw:\ensuremath{\!\sslash\!}> _" [90, 90] 90) and
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	19	REL ("\<approx>")
52 4a517c6ac07d tuning of the syntax; needs the stmaryrd latex package urbanc parents: 51 diff changeset	20
24 f72c82bf59e5 added paper urbanc parents: diff changeset	21	(>)
f72c82bf59e5 added paper urbanc parents: diff changeset	22
f72c82bf59e5 added paper urbanc parents: diff changeset	23	section {* Introduction *}
f72c82bf59e5 added paper urbanc parents: diff changeset	24
f72c82bf59e5 added paper urbanc parents: diff changeset	25	text {*
58 0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	26	Regular languages are an important and well-understood subject in Computer
0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	27	Science with many beautiful theorems and many useful algorithms. There is a
59 fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	28	wide range of textbooks about this subject. Many of which, such as
58 0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	29	\cite{Kozen97}, are aimed at students and contain very detailed
0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	30	``pencil-and-paper'' proofs. It seems natural to exercise theorem provers by
59 fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	31	formalising these theorems and by verifying formally the algorithms.
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	32
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	33	There is however a problem: the typical approach to the subject of regular
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	34	languages is to introduce finite automata and define everything in terms of
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	35	automata. For example, a regular language is nearly always defined as one
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	36	where there is a finite deterministic automata that recognises all the
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	37	strings of the language. One benefit of this approach is that it is easy
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	38	to prove on paper that regular languages are closed under complementation:
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	39	one just has to exchange the accepting and non-accepting states in the corresponding
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	40	automata into non-accepting. The problem with automata is that they need
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	41	to be represented as graphs (nodes represent states and edges, transitions)
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	42	or as matrices (recording the transitions between states). Both representations
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	43	cannot be defined in terms of inductive datatypes, and a reasoning infrastructure
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	44	must be established manually. To our knowledge neither
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	45	Isabelle nor HOL4 nor Coq have a readily usable reasoning library for graphs
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	46	and matrices. Moreover, the reasoning about graphs can be quite cumbersome.
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	47	Cosider for example the frequently used operation of combinding two automata
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	48	into a new compound automaton. While on paper, this can be done by just forming
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	49	the disjoint union of the nodes, this does not work in typed systems \dots
fc35eb54fdc9 more on the intro urbanc parents: 58 diff changeset	50
58 0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	51
0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	52
0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	53
0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	54
0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	55
54 c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	56
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	57	Therefore instead of defining a regular language as being one where there exists an
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	58	automata that regognises all of its strings, we define
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	59
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	60	\begin{definition}[A Regular Language]
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	61	A language @{text A} is regular, if there is a regular expression that matches all
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	62	strings of @{text "A"}.
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	63	\end{definition}
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	64
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	65	\noindent
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	66	{\bf Contributions:} A proof of the Myhil-Nerode Theorem based on regular expressions. The
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	67	finiteness part of this theorem is proved using tagging-functions (which to our knowledge
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	68	are novel in this context).
24 f72c82bf59e5 added paper urbanc parents: diff changeset	69
f72c82bf59e5 added paper urbanc parents: diff changeset	70	*}
f72c82bf59e5 added paper urbanc parents: diff changeset	71
50 32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	72	section {* Preliminaries *}
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	73
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	74	text {*
58 0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	75	Strings in Isabelle/HOL are lists of characters and the
0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	76	\emph{empty string} is the empty list, written @{term "[]"}. \emph{Languages} are sets of
0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	77	strings. The language containing all strings is written in Isabelle/HOL as @{term "UNIV::string set"}.
0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	78	The notation for the quotient of a language @{text A} according to a relation @{term REL} is
0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	79	@{term "A // REL"}. The concatenation of two languages is written @{term "A ;; B"}; a language
0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	80	raised tow the power $n$ is written @{term "A \<up> n"}. Both concepts are defined as
54 c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	81
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	82	\begin{center}
58 0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	83	@{thm Seq_def[THEN eq_reflection, where A1="A" and B1="B"]}
0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	84	\hspace{7mm}
0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	85	@{thm pow.simps(1)[THEN eq_reflection, where A1="A"]}
0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	86	\hspace{7mm}
0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	87	@{thm pow.simps(2)[THEN eq_reflection, where A1="A" and n1="n"]}
54 c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	88	\end{center}
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	89
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	90	\noindent
58 0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	91	where @{text "@"} is the usual list-append operation. The Kleene-star of a language @{text A}
0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	92	is defined as the union over all powers, namely @{thm Star_def}.
0d4d5bb321dc a little bit in the introduction urbanc parents: 54 diff changeset	93
54 c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	94
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	95	Regular expressions are defined as the following datatype
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	96
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	97	\begin{center}
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	98	@{text r} @{text "::="}
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	99	@{term NULL}\hspace{1.5mm}@{text"\|"}\hspace{1.5mm}
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	100	@{term EMPTY}\hspace{1.5mm}@{text"\|"}\hspace{1.5mm}
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	101	@{term "CHAR c"}\hspace{1.5mm}@{text"\|"}\hspace{1.5mm}
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	102	@{term "SEQ r r"}\hspace{1.5mm}@{text"\|"}\hspace{1.5mm}
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	103	@{term "ALT r r"}\hspace{1.5mm}@{text"\|"}\hspace{1.5mm}
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	104	@{term "STAR r"}
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	105	\end{center}
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	106
51 6cfb92de4654 some tuning of the paper urbanc parents: 50 diff changeset	107	Central to our proof will be the solution of equational systems
50 32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	108	involving regular expressions. For this we will use the following ``reverse''
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	109	version of Arden's lemma.
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	110
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	111	\begin{lemma}[Reverse Arden's Lemma]\mbox{}\\
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	112	If @{thm (prem 1) ardens_revised} then
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	113	@{thm (lhs) ardens_revised} has the unique solution
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	114	@{thm (rhs) ardens_revised}.
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	115	\end{lemma}
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	116
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	117	\begin{proof}
51 6cfb92de4654 some tuning of the paper urbanc parents: 50 diff changeset	118	For the right-to-left direction we assume @{thm (rhs) ardens_revised} and show
6cfb92de4654 some tuning of the paper urbanc parents: 50 diff changeset	119	that @{thm (lhs) ardens_revised} holds. From Lemma ??? we have @{term "A\<star> = {[]} \<union> A ;; A\<star>"},
50 32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	120	which is equal to @{term "A\<star> = {[]} \<union> A\<star> ;; A"}. Adding @{text B} to both
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	121	sides gives @{term "B ;; A\<star> = B ;; ({[]} \<union> A\<star> ;; A)"}, whose right-hand side
51 6cfb92de4654 some tuning of the paper urbanc parents: 50 diff changeset	122	is equal to @{term "(B ;; A\<star>) ;; A \<union> B"}. This completes this direction.
50 32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	123
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	124	For the other direction we assume @{thm (lhs) ardens_revised}. By a simple induction
51 6cfb92de4654 some tuning of the paper urbanc parents: 50 diff changeset	125	on @{text n}, we can establish the property
50 32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	126
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	127	\begin{center}
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	128	@{text "(*)"}\hspace{5mm} @{thm (concl) ardens_helper}
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	129	\end{center}
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	130
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	131	\noindent
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	132	Using this property we can show that @{term "B ;; (A \<up> n) \<subseteq> X"} holds for
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	133	all @{text n}. From this we can infer @{term "B ;; A\<star> \<subseteq> X"} using Lemma ???.
51 6cfb92de4654 some tuning of the paper urbanc parents: 50 diff changeset	134	For the inclusion in the other direction we assume a string @{text s}
50 32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	135	with length @{text k} is element in @{text X}. Since @{thm (prem 1) ardens_revised}
51 6cfb92de4654 some tuning of the paper urbanc parents: 50 diff changeset	136	we know that @{term "s \<notin> X ;; (A \<up> Suc k)"} since its length is only @{text k}
6cfb92de4654 some tuning of the paper urbanc parents: 50 diff changeset	137	(the strings in @{term "X ;; (A \<up> Suc k)"} are all longer).
53 da85feadb8e3 small typo urbanc parents: 52 diff changeset	138	From @{text "(*)"} it follows then that
50 32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	139	@{term s} must be element in @{term "(\<Union>m\<in>{0..k}. B ;; (A \<up> m))"}. This in turn
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	140	implies that @{term s} is in @{term "(\<Union>n. B ;; (A \<up> n))"}. Using Lemma ??? this
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	141	is equal to @{term "B ;; A\<star>"}, as we needed to show.\qed
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	142	\end{proof}
32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	143	*}
39 a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	144
54 c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	145	section {* Finite Partitions Imply Regularity of a Language *}
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	146
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	147	text {*
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	148	\begin{theorem}
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	149	Given a language @{text A}.
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	150	@{thm[mode=IfThen] hard_direction[where Lang="A"]}
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	151	\end{theorem}
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	152	*}
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	153
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	154	section {* Regular Expressions Generate Finitely Many Partitions *}
39 a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	155
a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	156	text {*
a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	157
54 c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	158	\begin{theorem}
39 a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	159	Given @{text "r"} is a regular expressions, then @{thm rexp_imp_finite}.
54 c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	160	\end{theorem}
39 a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	161
a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	162	\begin{proof}
a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	163	By induction on the structure of @{text r}. The cases for @{const NULL}, @{const EMPTY}
50 32bff8310071 revised proof of Ardens lemma urbanc parents: 39 diff changeset	164	and @{const CHAR} are straightforward, because we can easily establish
39 a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	165
a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	166	\begin{center}
a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	167	\begin{tabular}{l}
a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	168	@{thm quot_null_eq}\\
a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	169	@{thm quot_empty_subset}\\
a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	170	@{thm quot_char_subset}
a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	171	\end{tabular}
a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	172	\end{center}
a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	173
a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	174	\end{proof}
a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	175	*}
a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	176
a59473f0229d tuned a little bit the section about finite partitions urbanc parents: 37 diff changeset	177
54 c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	178	section {* Conclusion and Related Work *}
c19d2fc2cc69 a bit more on the paper urbanc parents: 53 diff changeset	179
24 f72c82bf59e5 added paper urbanc parents: diff changeset	180	(<)
f72c82bf59e5 added paper urbanc parents: diff changeset	181	end
f72c82bf59e5 added paper urbanc parents: diff changeset	182	(>)

author	urbanc
	Wed, 02 Feb 2011 15:43:22 +0000
changeset 59	fc35eb54fdc9
parent 58	0d4d5bb321dc
child 60	fb08f41ca33d
permissions	-rw-r--r--