(*<*)theory Paperimports "../Myhill" "LaTeXsugar"begindeclare [[show_question_marks = false]]consts REL :: "(string \<times> string) \<Rightarrow> bool"notation (latex output) str_eq_rel ("\<approx>\<^bsub>_\<^esub>") and Seq (infixr "\<cdot>" 100) and Star ("_\<^bsup>\<star>\<^esup>") and pow ("_\<^bsup>_\<^esup>" [100, 100] 100) and Suc ("_+1" [100] 100) and quotient ("_ \<^raw:\ensuremath{\!\sslash\!}> _" [90, 90] 90) and REL ("\<approx>")(*>*)section {* Introduction *}text {* Regular languages are an important and well-understood subject in Computer Science, with many beautiful theorems and many useful algorithms. There is a wide range of textbooks on this subject, many of which are aimed at students containing very detailed ``pencil-and-paper'' proofs (e.g.~\cite{Kozen97}). It seems natural to exercise theorem provers by formalising these theorems and by verifying formally the algorithms. There is however a problem with this: the typical approach to regular languages is to introduce automata and then define everything in terms of them. For example, a regular language is normally defined as one where there is a finite deterministic automaton that recognises all the strings of the language. One benefit of this approach is that it is easy to convince oneself from the fact that regular languages are closed under complementation: one just has to exchange the accepting and non-accepting states in the automaton to obtain an automaton for the complement language. The problem lies with formalising such reasoning in a theorem prover, in our case Isabelle/HOL. Automata need to be represented as graphs or matrices, neither of which can be defined as inductive datatype.\footnote{In some works functions are used to represent transitions, but they are also not inductive datatypes.} This means we have to build our own reasoning infrastructure for them, as neither Isabelle nor HOL4 support them with libraries. Even worse, reasoning about graphs in typed languages can be a real hassle making proofs quite clunky. Consider for example the operation of combining two automata into a new automaton by connecting their initial states to a new initial state (similarly with the accepting states):a \begin{center} picture \end{center} \noindent How should we implement this operation? On paper we can just form the disjoint union of the state nodes and add two more nodes---one for the new initial state, the other for the new accepting state. Therefore instead of defining a regular language as being one where there exists an automata that recognises all of its strings, we define \begin{definition}[A Regular Language] A language @{text A} is regular, if there is a regular expression that matches all strings of @{text "A"}. \end{definition} \noindent {\bf Contributions:} A proof of the Myhill-Nerode Theorem based on regular expressions. The finiteness part of this theorem is proved using tagging-functions (which to our knowledge are novel in this context).*}section {* Preliminaries *}text {* Strings in Isabelle/HOL are lists of characters and the \emph{empty string} is the empty list, written @{term "[]"}. \emph{Languages} are sets of strings. The language containing all strings is written in Isabelle/HOL as @{term "UNIV::string set"}. The notation for the quotient of a language @{text A} according to a relation @{term REL} is @{term "A // REL"}. The concatenation of two languages is written @{term "A ;; B"}; a language raised tow the power $n$ is written @{term "A \<up> n"}. Both concepts are defined as \begin{center} @{thm Seq_def[THEN eq_reflection, where A1="A" and B1="B"]} \hspace{7mm} @{thm pow.simps(1)[THEN eq_reflection, where A1="A"]} \hspace{7mm} @{thm pow.simps(2)[THEN eq_reflection, where A1="A" and n1="n"]} \end{center} \noindent where @{text "@"} is the usual list-append operation. The Kleene-star of a language @{text A} is defined as the union over all powers, namely @{thm Star_def}. Regular expressions are defined as the following datatype \begin{center} @{text r} @{text "::="} @{term NULL}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} @{term EMPTY}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} @{term "CHAR c"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} @{term "SEQ r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} @{term "ALT r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} @{term "STAR r"} \end{center} Central to our proof will be the solution of equational systems involving regular expressions. For this we will use the following ``reverse'' version of Arden's lemma. \begin{lemma}[Reverse Arden's Lemma]\mbox{}\\ If @{thm (prem 1) ardens_revised} then @{thm (lhs) ardens_revised} has the unique solution @{thm (rhs) ardens_revised}. \end{lemma} \begin{proof} For the right-to-left direction we assume @{thm (rhs) ardens_revised} and show that @{thm (lhs) ardens_revised} holds. From Lemma ??? we have @{term "A\<star> = {[]} \<union> A ;; A\<star>"}, which is equal to @{term "A\<star> = {[]} \<union> A\<star> ;; A"}. Adding @{text B} to both sides gives @{term "B ;; A\<star> = B ;; ({[]} \<union> A\<star> ;; A)"}, whose right-hand side is equal to @{term "(B ;; A\<star>) ;; A \<union> B"}. This completes this direction. For the other direction we assume @{thm (lhs) ardens_revised}. By a simple induction on @{text n}, we can establish the property \begin{center} @{text "(*)"}\hspace{5mm} @{thm (concl) ardens_helper} \end{center} \noindent Using this property we can show that @{term "B ;; (A \<up> n) \<subseteq> X"} holds for all @{text n}. From this we can infer @{term "B ;; A\<star> \<subseteq> X"} using Lemma ???. For the inclusion in the other direction we assume a string @{text s} with length @{text k} is element in @{text X}. Since @{thm (prem 1) ardens_revised} we know that @{term "s \<notin> X ;; (A \<up> Suc k)"} since its length is only @{text k} (the strings in @{term "X ;; (A \<up> Suc k)"} are all longer). From @{text "(*)"} it follows then that @{term s} must be element in @{term "(\<Union>m\<in>{0..k}. B ;; (A \<up> m))"}. This in turn implies that @{term s} is in @{term "(\<Union>n. B ;; (A \<up> n))"}. Using Lemma ??? this is equal to @{term "B ;; A\<star>"}, as we needed to show.\qed \end{proof}*}section {* Finite Partitions Imply Regularity of a Language *}text {* \begin{theorem} Given a language @{text A}. @{thm[mode=IfThen] hard_direction[where Lang="A"]} \end{theorem}*}section {* Regular Expressions Generate Finitely Many Partitions *}text {* \begin{theorem} Given @{text "r"} is a regular expressions, then @{thm rexp_imp_finite}. \end{theorem} \begin{proof} By induction on the structure of @{text r}. The cases for @{const NULL}, @{const EMPTY} and @{const CHAR} are straightforward, because we can easily establish \begin{center} \begin{tabular}{l} @{thm quot_null_eq}\\ @{thm quot_empty_subset}\\ @{thm quot_char_subset} \end{tabular} \end{center} \end{proof}*}section {* Conclusion and Related Work *}(*<*)end(*>*)