Paper/Paper.thy
changeset 70 8ab3a06577cf
parent 67 7478be786f87
child 71 426070e68b21
equal deleted inserted replaced
69:ecf6c61a4541 70:8ab3a06577cf
     7 
     7 
     8 consts
     8 consts
     9  REL :: "(string \<times> string) \<Rightarrow> bool"
     9  REL :: "(string \<times> string) \<Rightarrow> bool"
    10  UPLUS :: "'a set \<Rightarrow> 'a set \<Rightarrow> (nat \<times> 'a) set"
    10  UPLUS :: "'a set \<Rightarrow> 'a set \<Rightarrow> (nat \<times> 'a) set"
    11 
    11 
       
    12 abbreviation
       
    13   "EClass x R \<equiv> R `` {x}"
    12 
    14 
    13 notation (latex output)
    15 notation (latex output)
    14   str_eq_rel ("\<approx>\<^bsub>_\<^esub>") and
    16   str_eq_rel ("\<approx>\<^bsub>_\<^esub>") and
    15   Seq (infixr "\<cdot>" 100) and
    17   Seq (infixr "\<cdot>" 100) and
    16   Star ("_\<^bsup>\<star>\<^esup>") and
    18   Star ("_\<^bsup>\<star>\<^esup>") and
    17   pow ("_\<^bsup>_\<^esup>" [100, 100] 100) and
    19   pow ("_\<^bsup>_\<^esup>" [100, 100] 100) and
    18   Suc ("_+1" [100] 100) and
    20   Suc ("_+1" [100] 100) and
    19   quotient ("_ \<^raw:\ensuremath{\!\sslash\!}> _" [90, 90] 90) and
    21   quotient ("_ \<^raw:\ensuremath{\!\sslash\!}> _" [90, 90] 90) and
    20   REL ("\<approx>") and
    22   REL ("\<approx>") and
    21   UPLUS ("_ \<^raw:\ensuremath{\uplus}> _" [90, 90] 90) and
    23   UPLUS ("_ \<^raw:\ensuremath{\uplus}> _" [90, 90] 90) and
    22   L ("L '(_')" [0] 101)
    24   L ("L '(_')" [0] 101) and
       
    25   EClass ("\<lbrakk>_\<rbrakk>\<^bsub>_\<^esub>" [100, 100] 100) 
    23 (*>*)
    26 (*>*)
       
    27 
    24 
    28 
    25 section {* Introduction *}
    29 section {* Introduction *}
    26 
    30 
    27 text {*
    31 text {*
    28   Regular languages are an important and well-understood subject in Computer
    32   Regular languages are an important and well-understood subject in Computer
    39   benefits. Among them is that it is easy to convince oneself from the fact that
    43   benefits. Among them is that it is easy to convince oneself from the fact that
    40   regular languages are closed under complementation: one just has to exchange
    44   regular languages are closed under complementation: one just has to exchange
    41   the accepting and non-accepting states in the corresponding automaton to
    45   the accepting and non-accepting states in the corresponding automaton to
    42   obtain an automaton for the complement language.  The problem, however, lies with
    46   obtain an automaton for the complement language.  The problem, however, lies with
    43   formalising such reasoning in a HOL-based theorem prover, in our case
    47   formalising such reasoning in a HOL-based theorem prover, in our case
    44   Isabelle/HOL. Automata consist of states and transitions. They need to be represented 
    48   Isabelle/HOL. Automata are build up from states and transitions that 
    45   as graphs or matrices, neither
    49   need to be represented as graphs or matrices, neither
    46   of which can be defined as inductive datatype.\footnote{In some works
    50   of which can be defined as inductive datatype.\footnote{In some works
    47   functions are used to represent state transitions, but also they are not
    51   functions are used to represent state transitions, but also they are not
    48   inductive datatypes.} This means we have to build our own reasoning
    52   inductive datatypes.} This means we have to build our own reasoning
    49   infrastructure for them, as neither Isabelle/HOL nor HOL4 nor HOLlight support
    53   infrastructure for them, as neither Isabelle/HOL nor HOL4 nor HOLlight support
    50   them with libraries.
    54   them with libraries.
   123   Using this definition for disjoint unions means we do not have a single type for automata
   127   Using this definition for disjoint unions means we do not have a single type for automata
   124   and hence will not be able to state properties about \emph{all}
   128   and hence will not be able to state properties about \emph{all}
   125   automata, since there is no type quantification available in HOL. An
   129   automata, since there is no type quantification available in HOL. An
   126   alternative, which provides us with a single type for automata, is to give every 
   130   alternative, which provides us with a single type for automata, is to give every 
   127   state node an identity, for example a natural
   131   state node an identity, for example a natural
   128   number, and then be careful renaming these identities apart whenever
   132   number, and then be careful to rename these identities apart whenever
   129   connecting two automata. This results in clunky proofs
   133   connecting two automata. This results in clunky proofs
   130   establishing that properties are invariant under renaming. Similarly,
   134   establishing that properties are invariant under renaming. Similarly,
   131   connecting two automata represented as matrices results in very adhoc
   135   connecting two automata represented as matrices results in very adhoc
   132   constructions, which are not pleasant to reason about.
   136   constructions, which are not pleasant to reason about.
   133 
   137 
   134   Because of these problems to do with representing automata, there seems
   138   Because of these problems to do with representing automata, there seems
   135   to be no substantial formalisation of automata theory and regular languages 
   139   to be no substantial formalisation of automata theory and regular languages 
   136   carried out in a HOL-based theorem prover. We are only aware of the 
   140   carried out in a HOL-based theorem prover. We are only aware of the 
   137   large formalisation of the automata theory in Nuprl \cite{Constable00} and 
   141   large formalisation of automata theory in Nuprl \cite{Constable00} and 
   138   some smaller formalisations in Coq, for example \cite{Filliatre97}.
   142   some smaller formalisations in Coq, for example \cite{Filliatre97}.
   139   
   143   
   140   In this paper, we will not attempt to formalise automata theory, but take a completely 
   144   In this paper, we will not attempt to formalise automata theory, but take a completely 
   141   different approach to regular languages. Instead of defining a regular language as one 
   145   different approach to regular languages. Instead of defining a regular language as one 
   142   where there exists an automaton that recognises all strings of the language, we define 
   146   where there exists an automaton that recognises all strings of the language, we define 
   148   \end{definition}
   152   \end{definition}
   149   
   153   
   150   \noindent
   154   \noindent
   151   The reason is that regular expressions, unlike graphs and matrices, can
   155   The reason is that regular expressions, unlike graphs and matrices, can
   152   be easily defined as inductive datatype. Therefore a corresponding reasoning 
   156   be easily defined as inductive datatype. Therefore a corresponding reasoning 
   153   infrastructure comes for free. This has recently been used for formalising regular 
   157   infrastructure comes for free. This has recently been used in HOL4 for formalising regular 
   154   expression matching in HOL4 \cite{OwensSlind08}.  The purpose of this paper is to
   158   expression matching based on derivatives \cite{OwensSlind08}.  The purpose of this paper is to
   155   show that a central result about regular languages, the Myhill-Nerode theorem,  
   159   show that a central result about regular languages, the Myhill-Nerode theorem,  
   156   can be recreated by only using regular expressions. This theorem give a necessary
   160   can be recreated by only using regular expressions. This theorem gives a necessary
   157   and sufficient condition for when a language is regular. As a corollary of this
   161   and sufficient condition for when a language is regular. As a corollary of this
   158   theorem we can easily establish the usual closure properties, including 
   162   theorem we can easily establish the usual closure properties, including 
   159   complementation, for regular languages.\smallskip
   163   complementation, for regular languages.\smallskip
   160   
   164   
   161   \noindent
   165   \noindent
   259   @{thm (lhs) L_rexp.simps(6)[where r="r"]} & @{text "\<equiv>"} &
   263   @{thm (lhs) L_rexp.simps(6)[where r="r"]} & @{text "\<equiv>"} &
   260       @{thm (rhs) L_rexp.simps(6)[where r="r"]}\\
   264       @{thm (rhs) L_rexp.simps(6)[where r="r"]}\\
   261   \end{tabular}
   265   \end{tabular}
   262   \end{tabular}
   266   \end{tabular}
   263   \end{center}
   267   \end{center}
       
   268 
   264 *}
   269 *}
   265 
   270 
   266 section {* Finite Partitions Imply Regularity of a Language *}
   271 section {* Finite Partitions Imply Regularity of a Language *}
   267 
   272 
   268 text {*
   273 text {*
       
   274   \begin{definition}[Myhill-Nerode Relation]\mbox{}\\
       
   275   @{thm str_eq_rel_def[simplified]}
       
   276   \end{definition}
       
   277 
       
   278   \begin{definition} @{text "finals A"} are the equivalence classes that contain
       
   279   strings from @{text A}\\
       
   280   @{thm finals_def}
       
   281   \end{definition}
       
   282 
       
   283   @{thm lang_is_union_of_finals}
       
   284 
       
   285 
   269   \begin{theorem}
   286   \begin{theorem}
   270   Given a language @{text A}.
   287   Given a language @{text A}.
   271   @{thm[mode=IfThen] hard_direction[where Lang="A"]}
   288   @{thm[mode=IfThen] hard_direction}
   272   \end{theorem}
   289   \end{theorem}
   273 *}
   290 *}
   274 
   291 
   275 section {* Regular Expressions Generate Finitely Many Partitions *}
   292 section {* Regular Expressions Generate Finitely Many Partitions *}
   276 
   293