34 Setalt ("\<^raw:\ensuremath{\bigplus}>_" [1000] 999) and |
34 Setalt ("\<^raw:\ensuremath{\bigplus}>_" [1000] 999) and |
35 append_rexp2 ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 100) and |
35 append_rexp2 ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 100) and |
36 append_rhs_rexp ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 50) and |
36 append_rhs_rexp ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 50) and |
37 uminus ("\<^raw:\ensuremath{\overline{>_\<^raw:}}>" [100] 100) and |
37 uminus ("\<^raw:\ensuremath{\overline{>_\<^raw:}}>" [100] 100) and |
38 tag_str_ALT ("tag\<^isub>A\<^isub>L\<^isub>T _ _" [100, 100] 100) and |
38 tag_str_ALT ("tag\<^isub>A\<^isub>L\<^isub>T _ _" [100, 100] 100) and |
39 tag_str_ALT ("tag\<^isub>A\<^isub>L\<^isub>T _ _ _" [100, 100, 100] 100) |
39 tag_str_ALT ("tag\<^isub>A\<^isub>L\<^isub>T _ _ _" [100, 100, 100] 100) and |
40 |
40 tag_str_SEQ ("tag\<^isub>S\<^isub>E\<^isub>Q _ _" [100, 100] 100) and |
|
41 tag_str_SEQ ("tag\<^isub>S\<^isub>E\<^isub>Q _ _ _" [100, 100, 100] 100) and |
|
42 tag_str_STAR ("tag\<^isub>S\<^isub>T\<^isub>A\<^isub>R _" [100] 100) and |
|
43 tag_str_STAR ("tag\<^isub>S\<^isub>T\<^isub>A\<^isub>R _ _" [100, 100] 100) |
41 lemma meta_eq_app: |
44 lemma meta_eq_app: |
42 shows "f \<equiv> \<lambda>x. g x \<Longrightarrow> f x \<equiv> g x" |
45 shows "f \<equiv> \<lambda>x. g x \<Longrightarrow> f x \<equiv> g x" |
43 by auto |
46 by auto |
44 |
47 |
45 (*>*) |
48 (*>*) |
263 \noindent |
266 \noindent |
264 In @{text "(ii)"} we use the notation @{term "length s"} for the length of a |
267 In @{text "(ii)"} we use the notation @{term "length s"} for the length of a |
265 string; this property states that if @{term "[] \<notin> A"} then the lengths of |
268 string; this property states that if @{term "[] \<notin> A"} then the lengths of |
266 the strings in @{term "A \<up> (Suc n)"} must be longer than @{text n}. We omit |
269 the strings in @{term "A \<up> (Suc n)"} must be longer than @{text n}. We omit |
267 the proofs for these properties, but invite the reader to consult our |
270 the proofs for these properties, but invite the reader to consult our |
268 formalisation.\footnote{Available at ???} |
271 formalisation.\footnote{Available at \url{http://www4.in.tum.de/~urbanc/cgi-bin/repos.cgi/regexp/}} |
269 |
272 |
270 The notation in Isabelle/HOL for the quotient of a language @{text A} according to an |
273 The notation in Isabelle/HOL for the quotient of a language @{text A} according to an |
271 equivalence relation @{term REL} is @{term "A // REL"}. We will write |
274 equivalence relation @{term REL} is @{term "A // REL"}. We will write |
272 @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined |
275 @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined |
273 as @{text "{y | y \<approx> x}"}. |
276 as @{text "{y | y \<approx> x}"}. |
997 in which set, @{text A} or @{text B}, the string @{term "x @ z"} is. Fianlly we |
1000 in which set, @{text A} or @{text B}, the string @{term "x @ z"} is. Fianlly we |
998 can discharge this case by setting @{text A} to @{term "L r\<^isub>1"} and @{text B} to @{term "L r\<^isub>2"}.\qed |
1001 can discharge this case by setting @{text A} to @{term "L r\<^isub>1"} and @{text B} to @{term "L r\<^isub>2"}.\qed |
999 \end{proof} |
1002 \end{proof} |
1000 |
1003 |
1001 |
1004 |
1002 @{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B"]} |
1005 \noindent |
1003 |
1006 The pattern in \eqref{pattern} is repeated for the other two cases. Unfortunately, |
1004 @{thm tag_str_STAR_def[where ?L1.0="A"]} |
1007 they are slightly more complicated. |
|
1008 |
|
1009 |
|
1010 \begin{proof}[@{const SEQ}-Case] |
|
1011 |
|
1012 \begin{center} |
|
1013 @{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B", THEN meta_eq_app]} |
|
1014 \end{center} |
|
1015 \end{proof} |
|
1016 |
|
1017 \begin{proof}[@{const STAR}-Case] |
|
1018 |
|
1019 \begin{center} |
|
1020 @{thm tag_str_STAR_def[where ?L1.0="A", THEN meta_eq_app]} |
|
1021 \end{center} |
|
1022 \end{proof} |
1005 *} |
1023 *} |
1006 |
1024 |
1007 |
1025 |
1008 |
1026 |
1009 section {* Conclusion and Related Work *} |
1027 section {* Conclusion and Related Work *} |
1044 expression, like there is for automata. For an implementation of a simple |
1062 expression, like there is for automata. For an implementation of a simple |
1045 regular expression matcher, whose correctness has been formally |
1063 regular expression matcher, whose correctness has been formally |
1046 established, we refer the reader to Owens and Slind \cite{OwensSlind08}. |
1064 established, we refer the reader to Owens and Slind \cite{OwensSlind08}. |
1047 |
1065 |
1048 |
1066 |
1049 Our formalisation consists of ??? lines of Isabelle/Isar code for the first |
1067 Our formalisation consists of 790 lines of Isabelle/Isar code for the first |
1050 direction and ??? for the second. While this might be seen as too large to |
1068 direction and 475 for the second, plus standard material about regular languages. |
|
1069 While this might be seen as too large to |
1051 count as a concise proof pearl, this should be seen in the context of the |
1070 count as a concise proof pearl, this should be seen in the context of the |
1052 work done by Constable at al \cite{Constable00} who formalised the |
1071 work done by Constable at al \cite{Constable00} who formalised the |
1053 Myhill-Nerode theorem in Nuprl using automata. |
1072 Myhill-Nerode theorem in Nuprl using automata. |
1054 They write that their |
1073 They write that their |
1055 four-member team needed something on the magnitute of 18 months for their |
1074 four-member team needed something on the magnitute of 18 months for their |
1058 arguments. Unlike Constable et al, who were able to follow the proofs from |
1077 arguments. Unlike Constable et al, who were able to follow the proofs from |
1059 \cite{HopcroftUllman69}, we had to find our own arguments. So for us the formalisation |
1078 \cite{HopcroftUllman69}, we had to find our own arguments. So for us the formalisation |
1060 was not the bottleneck. It is hard to gauge the size of a formalisation in |
1079 was not the bottleneck. It is hard to gauge the size of a formalisation in |
1061 Nurpl, but from what is shown in the Nuprl Math Library about their |
1080 Nurpl, but from what is shown in the Nuprl Math Library about their |
1062 development it seems substantially larger than ours. The code of |
1081 development it seems substantially larger than ours. The code of |
1063 ours can be found under |
1082 ours can be found in the Mercurial Repository at |
1064 |
1083 |
1065 \begin{center} |
1084 \begin{center} |
1066 ??? |
1085 \url{http://www4.in.tum.de/~urbanc/cgi-bin/repos.cgi/regexp/} |
1067 \end{center} |
1086 \end{center} |
1068 |
1087 |
1069 |
1088 |
1070 Our proof of the first direction is very much inspired by \emph{Brzozowski's |
1089 Our proof of the first direction is very much inspired by \emph{Brzozowski's |
1071 algebraic mehod} used to convert a finite automaton to a regular |
1090 algebraic mehod} used to convert a finite automaton to a regular |
1093 calculate the derivatives and then to count them. Therefore we preferred our |
1112 calculate the derivatives and then to count them. Therefore we preferred our |
1094 direct method of using tagging-functions involving equivalence classes. This |
1113 direct method of using tagging-functions involving equivalence classes. This |
1095 is also where our method shines, because we can completely side-step the |
1114 is also where our method shines, because we can completely side-step the |
1096 standard argument \cite{Kozen97} where automata need to be composed, which |
1115 standard argument \cite{Kozen97} where automata need to be composed, which |
1097 as stated in the Introduction is not so convenient to formalise in a |
1116 as stated in the Introduction is not so convenient to formalise in a |
1098 HOL-based theorem prover. |
1117 HOL-based theorem prover. However, it is also the direction where we had to |
|
1118 spend most of the `conceptual' time, as our proof-argument based on tagging functions |
|
1119 is new for establishing the Myhill-Nerode theorem. |
1099 |
1120 |
1100 *} |
1121 *} |
1101 |
1122 |
1102 |
1123 |
1103 (*<*) |
1124 (*<*) |