regexp: Paper/Paper.thy@070f543e2560


(*<*)
theory Paper
imports "../Myhill" "LaTeXsugar"
begin

declare [[show_question_marks = false]]

consts
 REL :: "(string \<times> string) \<Rightarrow> bool"


notation (latex output)
  str_eq_rel ("\<approx>\<^bsub>_\<^esub>") and
  Seq (infixr "\<cdot>" 100) and
  Star ("_\<^bsup>\<star>\<^esup>") and
  pow ("_\<^bsup>_\<^esup>" [100, 100] 100) and
  Suc ("_+1" [100] 100) and
  quotient ("_ \<^raw:\ensuremath{\!\sslash\!}> _" [90, 90] 90) and
  REL ("\<approx>")

(*>*)

section {* Introduction *}

text {*
  Regular languages are an important and well-understood subject in Computer
  Science, with many beautiful theorems and many useful algorithms. There is a
  wide range of textbooks on this subject, many of which are aimed at
  students and contain very detailed ``pencil-and-paper'' proofs
  (e.g.~\cite{Kozen97}). It seems natural to exercise theorem provers by
  formalising these theorems and by verifying formally the algorithms.

  There is however a problem with this: the typical approach to regular
  languages is to introduce finite automata and then define everything in terms of
  them.  For example, a regular language is normally defined as one where
  there is a finite deterministic automaton that recognises all the strings of
  the language. This approach has many benefits. One is that it is easy to convince
  oneself from the fact that regular languages are closed under
  complementation: one just has to exchange the accepting and non-accepting
  states in the corresponding automaton to obtain an automaton for the complement language.
  The problem lies with formalising such reasoning in a theorem
  prover, in our case Isabelle/HOL. Automata need to be represented as graphs 
  or matrices, neither of which can be defined as inductive datatype.\footnote{In 
  some works functions are used to represent transitions, but they are also not 
  inductive datatypes.} This means we have to build our own reasoning infrastructure
  for them, as neither Isabelle nor HOL4 nor HOLlight support them with libraries.
  
  Even worse, reasoning about graphs in typed languages can be a real hassle. 
  Consider for example the operation of combining 
  two automata into a new automaton by connecting their
  initial states to a new initial state (similarly with the accepting states):
  
  \begin{center}
  picture
  \end{center}

  \noindent
  How should we implement this operation? On paper we can just
  form the disjoint union of the state nodes and add two more nodes---one for the
  new initial state, the other for the new accepting state. In a theorem
  prover based on set-theory, this operaton can be more or less
  straightforwardly implemented. But in a HOL-based theorem prover the
  standard definition of disjoint unions as pairs

  \begin{center}
  definition
  \end{center}

  \noindent
  changes the type (from sets of nodes to sets of pairs). This means we
  cannot formulate in this represeantation any property about \emph{all} 
  automata---since there is no type quantification available in HOL-based
  theorem provers. A working alternative is to give every state node an 
  identity, for example a natural number, and then be careful to rename
  these indentities appropriately when connecting two automata together. 
  This results in very clunky side-proofs establishing that properties
  are invariant under renaming. We are only aware of the formalisation 
  of automata theory in Nuprl that carries this approach trough and is
  quite substantial. 
  
  We will take a completely different approach to formalising theorems
  about regular languages. Instead of defining a regular language as one 
  where there exists an automaton that recognises all of its strings, we 
  define a regular language as

  \begin{definition}[A Regular Language]
  A language @{text A} is regular, if there is a regular expression that matches all
  strings of @{text "A"}.
  \end{definition}
  
  \noindent
  The reason is that regular expressinons, unlike graphs and metrices, can
  be eaily defined as inductive datatype and this means a reasoning infrastructre
  comes for them in Isabelle for free. The purpose of this paper is to
  show that a central and highly non-trivisl result about regular languages, 
  namely the Myhill-Nerode theorem,  can be recreated only using regular 
  expressions. In our approach we do not need to formalise graps or
  metrices.
  

  \noindent
  {\bf Contributions:} A proof of the Myhill-Nerode Theorem based on regular expressions. The 
  finiteness part of this theorem is proved using tagging-functions (which to our knowledge
  are novel in this context).
  
*}

section {* Preliminaries *}

text {*
  Strings in Isabelle/HOL are lists of characters and the
  \emph{empty string} is the empty list, written @{term "[]"}. \emph{Languages} are sets of 
  strings. The language containing all strings is written in Isabelle/HOL as @{term "UNIV::string set"}.
  The notation for the quotient of a language @{text A} according to a relation @{term REL} is
  @{term "A // REL"}. The concatenation of two languages is written @{term "A ;; B"}; a language 
  raised  tow the power $n$ is written @{term "A \<up> n"}. Both concepts are defined as

  \begin{center}
  @{thm Seq_def[THEN eq_reflection, where A1="A" and B1="B"]}
  \hspace{7mm}
  @{thm pow.simps(1)[THEN eq_reflection, where A1="A"]}
  \hspace{7mm}
  @{thm pow.simps(2)[THEN eq_reflection, where A1="A" and n1="n"]}
  \end{center}

  \noindent
  where @{text "@"} is the usual list-append operation. The Kleene-star of a language @{text A}
  is defined as the union over all powers, namely @{thm Star_def}.
  

  Regular expressions are defined as the following datatype

  \begin{center}
  @{text r} @{text "::="}
  @{term NULL}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
  @{term EMPTY}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
  @{term "CHAR c"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
  @{term "SEQ r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
  @{term "ALT r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
  @{term "STAR r"}
  \end{center}

  Central to our proof will be the solution of equational systems
  involving regular expressions. For this we will use the following ``reverse'' 
  version of Arden's lemma.

  \begin{lemma}[Reverse Arden's Lemma]\mbox{}\\
  If @{thm (prem 1) ardens_revised} then
  @{thm (lhs) ardens_revised} has the unique solution
  @{thm (rhs) ardens_revised}.
  \end{lemma}

  \begin{proof}
  For the right-to-left direction we assume @{thm (rhs) ardens_revised} and show
  that @{thm (lhs) ardens_revised} holds. From Lemma ??? we have @{term "A\<star> = {[]} \<union> A ;; A\<star>"},
  which is equal to @{term "A\<star> = {[]} \<union> A\<star> ;; A"}. Adding @{text B} to both 
  sides gives @{term "B ;; A\<star> = B ;; ({[]} \<union> A\<star> ;; A)"}, whose right-hand side
  is equal to @{term "(B ;; A\<star>) ;; A \<union> B"}. This completes this direction. 

  For the other direction we assume @{thm (lhs) ardens_revised}. By a simple induction
  on @{text n}, we can establish the property

  \begin{center}
  @{text "(*)"}\hspace{5mm} @{thm (concl) ardens_helper}
  \end{center}
  
  \noindent
  Using this property we can show that @{term "B ;; (A \<up> n) \<subseteq> X"} holds for
  all @{text n}. From this we can infer @{term "B ;; A\<star> \<subseteq> X"} using Lemma ???.
  For the inclusion in the other direction we assume a string @{text s}
  with length @{text k} is element in @{text X}. Since @{thm (prem 1) ardens_revised}
  we know that @{term "s \<notin> X ;; (A \<up> Suc k)"} since its length is only @{text k}
  (the strings in @{term "X ;; (A \<up> Suc k)"} are all longer). 
  From @{text "(*)"} it follows then that
  @{term s} must be element in @{term "(\<Union>m\<in>{0..k}. B ;; (A \<up> m))"}. This in turn
  implies that @{term s} is in @{term "(\<Union>n. B ;; (A \<up> n))"}. Using Lemma ??? this
  is equal to @{term "B ;; A\<star>"}, as we needed to show.\qed
  \end{proof}
*}

section {* Finite Partitions Imply Regularity of a Language *}

text {*
  \begin{theorem}
  Given a language @{text A}.
  @{thm[mode=IfThen] hard_direction[where Lang="A"]}
  \end{theorem}
*}

section {* Regular Expressions Generate Finitely Many Partitions *}

text {*

  \begin{theorem}
  Given @{text "r"} is a regular expressions, then @{thm rexp_imp_finite}.
  \end{theorem}  

  \begin{proof}
  By induction on the structure of @{text r}. The cases for @{const NULL}, @{const EMPTY}
  and @{const CHAR} are straightforward, because we can easily establish

  \begin{center}
  \begin{tabular}{l}
  @{thm quot_null_eq}\\
  @{thm quot_empty_subset}\\
  @{thm quot_char_subset}
  \end{tabular}
  \end{center}

  \end{proof}
*}


section {* Conclusion and Related Work *}

(*<*)
end
(*>*)
author	urbanc
	Thu, 03 Feb 2011 09:54:19 +0000
changeset 61	070f543e2560
parent 60	fb08f41ca33d
child 66	828ea293b61f
permissions	-rw-r--r--