Paper/Paper.thy
changeset 88 1436fc451bb9
parent 86 6457e668dee5
child 89 42af13d194c9
equal deleted inserted replaced
87:6a0efaabde19 88:1436fc451bb9
    24   UPLUS ("_ \<^raw:\ensuremath{\uplus}> _" [90, 90] 90) and
    24   UPLUS ("_ \<^raw:\ensuremath{\uplus}> _" [90, 90] 90) and
    25   L ("\<^raw:\ensuremath{\cal{L}}>'(_')" [0] 101) and
    25   L ("\<^raw:\ensuremath{\cal{L}}>'(_')" [0] 101) and
    26   Lam ("\<lambda>'(_')" [100] 100) and 
    26   Lam ("\<lambda>'(_')" [100] 100) and 
    27   Trn ("_, _" [100, 100] 100) and 
    27   Trn ("_, _" [100, 100] 100) and 
    28   EClass ("\<lbrakk>_\<rbrakk>\<^bsub>_\<^esub>" [100, 100] 100) and
    28   EClass ("\<lbrakk>_\<rbrakk>\<^bsub>_\<^esub>" [100, 100] 100) and
    29   transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longmapsto}}> _" [100, 100, 100] 100)
    29   transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longmapsto}}> _" [100, 100, 100] 100) and
       
    30   Setalt ("\<^raw:\ensuremath{\bigplus}>_" [1000] 999)
    30 (*>*)
    31 (*>*)
    31 
    32 
    32 
    33 
    33 section {* Introduction *}
    34 section {* Introduction *}
    34 
    35 
   114   \end{tabular}
   115   \end{tabular}
   115   \end{center}
   116   \end{center}
   116 
   117 
   117   \noindent
   118   \noindent
   118   On ``paper'' we can define the corresponding graph in terms of the disjoint 
   119   On ``paper'' we can define the corresponding graph in terms of the disjoint 
   119   union of the state nodes. Unfortunately in HOL, the definition for disjoint 
   120   union of the state nodes. Unfortunately in HOL, the standard definition for disjoint 
   120   union, namely 
   121   union, namely 
   121   %
   122   %
   122   \begin{equation}\label{disjointunion}
   123   \begin{equation}\label{disjointunion}
   123   @{term "UPLUS A\<^isub>1 A\<^isub>2 \<equiv> {(1, x) | x. x \<in> A\<^isub>1} \<union> {(2, y) | y. y \<in> A\<^isub>2}"}
   124   @{term "UPLUS A\<^isub>1 A\<^isub>2 \<equiv> {(1, x) | x. x \<in> A\<^isub>1} \<union> {(2, y) | y. y \<in> A\<^isub>2}"}
   124   \end{equation}
   125   \end{equation}
   135   establishing that properties are invariant under renaming. Similarly,
   136   establishing that properties are invariant under renaming. Similarly,
   136   connecting two automata represented as matrices results in very adhoc
   137   connecting two automata represented as matrices results in very adhoc
   137   constructions, which are not pleasant to reason about.
   138   constructions, which are not pleasant to reason about.
   138 
   139 
   139   Functions are much better supported in Isabelle/HOL, but they still lead to similar
   140   Functions are much better supported in Isabelle/HOL, but they still lead to similar
   140   problems as with graphs.  Composing two non-deterministic automata in parallel
   141   problems as with graphs.  Composing, for example, two non-deterministic automata in parallel
   141   poses still the problem of how to implement disjoint unions. Nipkow \cite{Nipkow98} 
   142   poses again the problem of how to implement disjoint unions. Nipkow \cite{Nipkow98} 
   142   dismisses the option using identities, because it leads to messy proofs. He
   143   dismisses the option of using identities, because it leads to ``messy proofs''. He
   143   opts for a variant of \eqref{disjointunion}, but writes
   144   opts for a variant of \eqref{disjointunion}, but writes
   144 
   145 
   145   \begin{quote}
   146   \begin{quote}
   146   \it ``If the reader finds the above treatment in terms of bit lists revoltingly
   147   \it ``If the reader finds the above treatment in terms of bit lists revoltingly
   147   concrete, I cannot disagree.''
   148   concrete, I cannot disagree.''
   170   A language @{text A} is \emph{regular}, provided there is a regular expression that matches all
   171   A language @{text A} is \emph{regular}, provided there is a regular expression that matches all
   171   strings of @{text "A"}.
   172   strings of @{text "A"}.
   172   \end{definition}
   173   \end{definition}
   173   
   174   
   174   \noindent
   175   \noindent
   175   The reason is that regular expressions, unlike graphs and matrices, can
   176   The reason is that regular expressions, unlike graphs, matrices and functons, can
   176   be easily defined as inductive datatype. Consequently a corresponding reasoning 
   177   be easily defined as inductive datatype. Consequently a corresponding reasoning 
   177   infrastructure comes for free. This has recently been exploited in HOL4 with a formalisation
   178   infrastructure comes for free. This has recently been exploited in HOL4 with a formalisation
   178   of regular expression matching based on derivatives \cite{OwensSlind08}.  The purpose of this paper is to
   179   of regular expression matching based on derivatives \cite{OwensSlind08}.  The purpose of this paper is to
   179   show that a central result about regular languages---the Myhill-Nerode theorem---can 
   180   show that a central result about regular languages---the Myhill-Nerode theorem---can 
   180   be recreated by only using regular expressions. This theorem gives necessary
   181   be recreated by only using regular expressions. This theorem gives necessary
   181   and sufficient conditions for when a language is regular. As a corollary of this
   182   and sufficient conditions for when a language is regular. As a corollary of this
   182   theorem we can easily establish the usual closure properties, including 
   183   theorem we can easily establish the usual closure properties, including 
   183   complementation, for regular languages.\smallskip
   184   complementation, for regular languages.\smallskip
   184   
   185   
   185   \noindent
   186   \noindent
   186   {\bf Contributions:} To our knowledge, our proof of the Myhill-Nerode theorem is the
   187   {\bf Contributions:} 
       
   188   There is an extensive literature on regular languages.
       
   189   To our knowledge, our proof of the Myhill-Nerode theorem is the
   187   first that is based on regular expressions, only. We prove the part of this theorem 
   190   first that is based on regular expressions, only. We prove the part of this theorem 
   188   stating that a regular expression has only finitely many partitions using certain 
   191   stating that a regular expression has only finitely many partitions using certain 
   189   tagging-functions. Again to our best knowledge, these tagging functions have
   192   tagging-functions. Again to our best knowledge, these tagging functions have
   190   not been used before to establish the Myhill-Nerode theorem.
   193   not been used before to establish the Myhill-Nerode theorem.
   191 *}
   194 *}
   209   \end{center}
   212   \end{center}
   210 
   213 
   211   \noindent
   214   \noindent
   212   where @{text "@"} is the usual list-append operation. The Kleene-star of a language @{text A}
   215   where @{text "@"} is the usual list-append operation. The Kleene-star of a language @{text A}
   213   is defined as the union over all powers, namely @{thm Star_def}. In the paper
   216   is defined as the union over all powers, namely @{thm Star_def}. In the paper
   214   we will often make use of the following properties.
   217   we will make use of the following properties of these constructions.
   215   
   218   
   216   \begin{proposition}\label{langprops}\mbox{}\\
   219   \begin{proposition}\label{langprops}\mbox{}\\
   217   \begin{tabular}{@ {}ll@ {\hspace{10mm}}ll}
   220   \begin{tabular}{@ {}ll@ {\hspace{10mm}}ll}
   218   (i)   & @{thm star_cases}      & (ii)  & @{thm[mode=IfThen] pow_length}\\
   221   (i)   & @{thm star_cases}      & (ii)  & @{thm[mode=IfThen] pow_length}\\
   219   (iii) & @{thm seq_Union_left}  & 
   222   (iii) & @{thm seq_Union_left}  & 
   220   \end{tabular}
   223   \end{tabular}
   221   \end{proposition}
   224   \end{proposition}
   222 
   225 
   223   \noindent
   226   \noindent
   224   We omit the proofs of these properties, but invite the reader to consult
   227   We omit the proofs, but invite the reader to consult
   225   our formalisation.\footnote{Available at ???}
   228   our formalisation.\footnote{Available at ???}
   226 
   229 
   227 
   230 
   228   The notation for the quotient of a language @{text A} according to an 
   231   The notation for the quotient of a language @{text A} according to an 
   229   equivalence relation @{term REL} is @{term "A // REL"}. We will write 
   232   equivalence relation @{term REL} is in Isabelle/HOL @{term "A // REL"}. We will write 
   230   @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined 
   233   @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined 
   231   as @{text "{y | y \<approx> x}"}.
   234   as @{text "{y | y \<approx> x}"}.
   232 
   235 
   233 
   236 
   234   Central to our proof will be the solution of equational systems
   237   Central to our proof will be the solution of equational systems
   272   implies that @{term s} is in @{term "(\<Union>n. B ;; (A \<up> n))"}. Using Prop.~\ref{langprops}@{text "(iii)"} 
   275   implies that @{term s} is in @{term "(\<Union>n. B ;; (A \<up> n))"}. Using Prop.~\ref{langprops}@{text "(iii)"} 
   273   this is equal to @{term "B ;; A\<star>"}, as we needed to show.\qed
   276   this is equal to @{term "B ;; A\<star>"}, as we needed to show.\qed
   274   \end{proof}
   277   \end{proof}
   275 
   278 
   276   \noindent
   279   \noindent
   277   Regular expressions are defined as the following inductive datatype
   280   Regular expressions are defined as the inductive datatype
   278 
   281 
   279   \begin{center}
   282   \begin{center}
   280   @{text r} @{text "::="}
   283   @{text r} @{text "::="}
   281   @{term NULL}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
   284   @{term NULL}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
   282   @{term EMPTY}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
   285   @{term EMPTY}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
   285   @{term "ALT r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
   288   @{term "ALT r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
   286   @{term "STAR r"}
   289   @{term "STAR r"}
   287   \end{center}
   290   \end{center}
   288 
   291 
   289   \noindent
   292   \noindent
   290   and the language matched by a regular expression is defined as:
   293   and the language matched by a regular expression is defined as
   291 
   294 
   292   \begin{center}
   295   \begin{center}
   293   \begin{tabular}{c@ {\hspace{10mm}}c}
   296   \begin{tabular}{c@ {\hspace{10mm}}c}
   294   \begin{tabular}{rcl}
   297   \begin{tabular}{rcl}
   295   @{thm (lhs) L_rexp.simps(1)} & @{text "\<equiv>"} & @{thm (rhs) L_rexp.simps(1)}\\
   298   @{thm (lhs) L_rexp.simps(1)} & @{text "\<equiv>"} & @{thm (rhs) L_rexp.simps(1)}\\
   306       @{thm (rhs) L_rexp.simps(6)[where r="r"]}\\
   309       @{thm (rhs) L_rexp.simps(6)[where r="r"]}\\
   307   \end{tabular}
   310   \end{tabular}
   308   \end{tabular}
   311   \end{tabular}
   309   \end{center}
   312   \end{center}
   310 
   313 
   311 
   314   \noindent
       
   315   Given a set or regular expressions @{text rs}, we will need the operation of generating 
       
   316   a regular expressions that matches all languages of @{text rs}. We only need the existence
       
   317   of such an regular expressions therefore we use Isabelle's @{const "fold_graph"} and Hilbert's
       
   318   @{text "\<epsilon>"} to define @{term "\<Uplus>rs"} which, roughly speaking, folds @{const ALT} over the 
       
   319   set @{text rs} with @{const NULL} for the empty set. We can prove that for finite sets @{text rs}
       
   320 
       
   321   \begin{center}
       
   322   @{thm (lhs) folds_alt_simp}@{text "= \<Union> (\<calL> ` rs)"}
       
   323   \end{center}
       
   324 
       
   325   \noindent
       
   326   holds. (whereby @{text "\<calL> ` rs"} stands for the 
       
   327   image of the set @{text rs} under function @{text "\<calL>"}).
       
   328   
   312 
   329 
   313 *}
   330 *}
   314 
   331 
   315 section {* Finite Partitions Imply Regularity of a Language *}
   332 section {* Finite Partitions Imply Regularity of a Language *}
   316 
   333