Paper/Paper.thy
changeset 77 63bc9f9d96ba
parent 75 d63baacbdb16
child 79 bba9c80735f9
equal deleted inserted replaced
76:1589bf5c1ad8 77:63bc9f9d96ba
   149   different approach to regular languages. Instead of defining a regular language as one 
   149   different approach to regular languages. Instead of defining a regular language as one 
   150   where there exists an automaton that recognises all strings of the language, we define 
   150   where there exists an automaton that recognises all strings of the language, we define 
   151   a regular language as:
   151   a regular language as:
   152 
   152 
   153   \begin{definition}[A Regular Language]
   153   \begin{definition}[A Regular Language]
   154   A language @{text A} is regular, provided there is a regular expression that matches all
   154   A language @{text A} is \emph{regular}, provided there is a regular expression that matches all
   155   strings of @{text "A"}.
   155   strings of @{text "A"}.
   156   \end{definition}
   156   \end{definition}
   157   
   157   
   158   \noindent
   158   \noindent
   159   The reason is that regular expressions, unlike graphs and matrices, can
   159   The reason is that regular expressions, unlike graphs and matrices, can
   214   @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined 
   214   @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined 
   215   as @{text "{y | y \<approx> x}"}.
   215   as @{text "{y | y \<approx> x}"}.
   216 
   216 
   217 
   217 
   218   Central to our proof will be the solution of equational systems
   218   Central to our proof will be the solution of equational systems
   219   involving regular expressions. For this we will use Arden's lemma \cite{Brzozowski64}
   219   involving sets of languages. For this we will use Arden's lemma \cite{Brzozowski64}
   220   which solves equations of the form @{term "X = A ;; X \<union> B"} provided
   220   which solves equations of the form @{term "X = A ;; X \<union> B"} provided
   221   @{term "[] \<notin> A"}. However we will need the following ``reverse'' 
   221   @{term "[] \<notin> A"}. However we will need the following ``reverse'' 
   222   version of Arden's lemma.
   222   version of Arden's lemma.
   223 
   223 
   224   \begin{lemma}[Reverse Arden's Lemma]\label{arden}\mbox{}\\
   224   \begin{lemma}[Reverse Arden's Lemma]\label{arden}\mbox{}\\
   295 *}
   295 *}
   296 
   296 
   297 section {* Finite Partitions Imply Regularity of a Language *}
   297 section {* Finite Partitions Imply Regularity of a Language *}
   298 
   298 
   299 text {*
   299 text {*
   300   The central definition in the Myhill-Nerode theorem is the
   300   The key definition in the Myhill-Nerode theorem is the
   301   \emph{Myhill-Nerode relation}, which states that w.r.t.~a language two 
   301   \emph{Myhill-Nerode relation}, which states that w.r.t.~a language two 
   302   strings are related, provided there is no distinguishing extension in this
   302   strings are related, provided there is no distinguishing extension in this
   303   language. This can be defined as:
   303   language. This can be defined as:
   304 
   304 
   305   \begin{definition}[Myhill-Nerode Relation]\mbox{}\\
   305   \begin{definition}[Myhill-Nerode Relation]\mbox{}\\
   309   \noindent
   309   \noindent
   310   It is easy to see that @{term "\<approx>A"} is an equivalence relation, which
   310   It is easy to see that @{term "\<approx>A"} is an equivalence relation, which
   311   partitions the set of all strings, @{text "UNIV"}, into a set of disjoint
   311   partitions the set of all strings, @{text "UNIV"}, into a set of disjoint
   312   equivalence classes. One direction of the Myhill-Nerode theorem establishes 
   312   equivalence classes. One direction of the Myhill-Nerode theorem establishes 
   313   that if there are finitely many equivalence classes, then the language is 
   313   that if there are finitely many equivalence classes, then the language is 
   314   regular. In our setting we have therefore
   314   regular. In our setting we therefore have to show:
   315   
   315   
   316   \begin{theorem}\label{myhillnerodeone}
   316   \begin{theorem}\label{myhillnerodeone}
   317   @{thm[mode=IfThen] hard_direction}
   317   @{thm[mode=IfThen] hard_direction}
   318   \end{theorem}
   318   \end{theorem}
   319 
   319 
   324   \begin{equation} 
   324   \begin{equation} 
   325   @{thm finals_def}
   325   @{thm finals_def}
   326   \end{equation}
   326   \end{equation}
   327 
   327 
   328   \noindent
   328   \noindent
   329   It is straightforward to show that @{thm lang_is_union_of_finals} holds. 
   329   It is straightforward to show that @{thm lang_is_union_of_finals} and 
       
   330   @{thm finals_included_in_UNIV} hold. 
   330   Therefore if we know that there exists a regular expression for every
   331   Therefore if we know that there exists a regular expression for every
   331   equivalence class in @{term "finals A"} (which by assumption must be
   332   equivalence class in @{term "finals A"} (which by assumption must be
   332   a finite set), then we can just combine these regular expressions with @{const ALT}
   333   a finite set), then we can combine these regular expressions with @{const ALT}
   333   and obtain a regular expression that matches every string in @{text A}.
   334   and obtain a regular expression that matches every string in @{text A}.
   334 
   335 
   335 
   336 
   336   We prove Thm.~\ref{myhillnerodeone} by calculating a regular expression for
   337   We prove Thm.~\ref{myhillnerodeone} by giving a method that can calculate a
   337   \emph{all} equivalence classes, not just the ones in @{term "finals A"}. We
   338   regular expression for \emph{every} equivalence classe, not just the ones 
       
   339   in @{term "finals A"}. We
   338   first define a notion of \emph{transition} between equivalence classes
   340   first define a notion of \emph{transition} between equivalence classes
   339   %
   341   %
   340   \begin{equation} 
   342   \begin{equation} 
   341   @{thm transition_def}
   343   @{thm transition_def}
   342   \end{equation}
   344   \end{equation}
   343 
   345 
   344   \noindent
   346   \noindent
   345   which means that if we concatenate all strings matching the regular expression @{text r} 
   347   which means that if we concatenate all strings matching the regular expression @{text r} 
   346   to the end of all strings in the equivalence class @{text Y}, we obtain a subset of 
   348   to the end of all strings in the equivalence class @{text Y}, we obtain a subset of 
   347   @{text X}. Note that we do not define any automaton here, we merely relate two sets
   349   @{text X}. Note that we do not define an automaton here, we merely relate two sets
   348   w.r.t.~a regular expression. 
   350   (w.r.t.~a regular expression). 
   349   
   351   
   350   Next we build an equational system that
   352   Next we build an equational system that
   351   contains an equation for each equivalence class. Suppose we have 
   353   contains an equation for each equivalence class. Suppose we have 
   352   the equivalence classes @{text "X\<^isub>1,\<dots>,X\<^isub>n"}, there must be one and only one that
   354   the equivalence classes @{text "X\<^isub>1,\<dots>,X\<^isub>n"}, there must be one and only one that
   353   contains the empty string @{text "[]"} (since equivalence classes are disjoint).
   355   contains the empty string @{text "[]"} (since equivalence classes are disjoint).
   354   Let us assume @{text "[] \<in> X\<^isub>1"}. We can build the following equational system
   356   Let us assume @{text "[] \<in> X\<^isub>1"}. We build the following equational system
   355   
   357   
   356   \begin{center}
   358   \begin{center}
   357   \begin{tabular}{rcl}
   359   \begin{tabular}{rcl}
   358   @{text "X\<^isub>1"} & @{text "="} & @{text "(Y\<^isub>1\<^isub>1, CHAR c\<^isub>1\<^isub>1) + \<dots> + (Y\<^isub>1\<^isub>p, CHAR c\<^isub>1\<^isub>p) + \<lambda>(EMPTY)"} \\
   360   @{text "X\<^isub>1"} & @{text "="} & @{text "(Y\<^isub>1\<^isub>1, CHAR c\<^isub>1\<^isub>1) + \<dots> + (Y\<^isub>1\<^isub>p, CHAR c\<^isub>1\<^isub>p) + \<lambda>(EMPTY)"} \\
   359   @{text "X\<^isub>2"} & @{text "="} & @{text "(Y\<^isub>2\<^isub>1, CHAR c\<^isub>2\<^isub>1) + \<dots> + (Y\<^isub>2\<^isub>o, CHAR c\<^isub>2\<^isub>o)"} \\
   361   @{text "X\<^isub>2"} & @{text "="} & @{text "(Y\<^isub>2\<^isub>1, CHAR c\<^isub>2\<^isub>1) + \<dots> + (Y\<^isub>2\<^isub>o, CHAR c\<^isub>2\<^isub>o)"} \\
   365   \noindent
   367   \noindent
   366   where the pairs @{text "(Y\<^isub>i\<^isub>j, r\<^isub>i\<^isub>j)"} stand for all transitions 
   368   where the pairs @{text "(Y\<^isub>i\<^isub>j, r\<^isub>i\<^isub>j)"} stand for all transitions 
   367   @{term "Y\<^isub>i\<^isub>j \<Turnstile>r\<^isub>i\<^isub>j\<Rightarrow> X\<^isub>i"}.  The term @{text "\<lambda>(EMPTY)"} acts as a marker for the equivalence
   369   @{term "Y\<^isub>i\<^isub>j \<Turnstile>r\<^isub>i\<^isub>j\<Rightarrow> X\<^isub>i"}.  The term @{text "\<lambda>(EMPTY)"} acts as a marker for the equivalence
   368   class containing @{text "[]"}. (Note that we mark, roughly speaking, the
   370   class containing @{text "[]"}. (Note that we mark, roughly speaking, the
   369   single ``initial'' state in the equational system, which is different from
   371   single ``initial'' state in the equational system, which is different from
   370   the method by Brzozowski \cite{Brzozowski64}, which marks the ``terminal''
   372   the method by Brzozowski \cite{Brzozowski64}, since for his purposes he needs to mark 
   371   states.) Overloading the function @{text L} for the two kinds of terms in the 
   373   the ``terminal'' states.) Overloading the function @{text L} for the two kinds of terms in the 
   372   equational system as follows
   374   equational system as follows
   373   
   375   
   374   \begin{center}
   376   \begin{center}
   375   @{thm L_rhs_e.simps(1)[where r="r", THEN eq_reflection]}\hspace{20mm}
   377   @{thm L_rhs_e.simps(2)[where X="Y" and r="r", THEN eq_reflection]}\hspace{10mm}
   376   @{thm L_rhs_e.simps(2)[where X="X" and r="r", THEN eq_reflection]}
   378   @{thm L_rhs_e.simps(1)[where r="r", THEN eq_reflection]}
   377   \end{center}
   379   \end{center}
   378 
   380 
   379   \noindent
   381   \noindent
   380   we can prove for @{text "X\<^isub>2\<^isub>.\<^isub>.\<^isub>n"} that the following equations
   382   we can prove for @{text "X\<^isub>2\<^isub>.\<^isub>.\<^isub>n"} that the following equations
   381   %
   383   %
   389   \begin{equation}\label{inv2}
   391   \begin{equation}\label{inv2}
   390   @{text "X\<^isub>1 = L(Y\<^isub>i\<^isub>1, CHAR c\<^isub>i\<^isub>1) \<union> \<dots> \<union> L(Y\<^isub>i\<^isub>p, CHAR c\<^isub>i\<^isub>p) \<union> L(\<lambda>(EMPTY))"}.
   392   @{text "X\<^isub>1 = L(Y\<^isub>i\<^isub>1, CHAR c\<^isub>i\<^isub>1) \<union> \<dots> \<union> L(Y\<^isub>i\<^isub>p, CHAR c\<^isub>i\<^isub>p) \<union> L(\<lambda>(EMPTY))"}.
   391   \end{equation}
   393   \end{equation}
   392 
   394 
   393   \noindent
   395   \noindent
   394   hold. The reason for adding the @{text \<lambda>}-marker to our equational system is 
   396   The reason for adding the @{text \<lambda>}-marker to our equational system is 
   395   to obtain this equations, which only holds in this form since none of 
   397   to obtain this equation, which only holds in this form since none of 
   396   the other terms contain the empty string. Our proof of Thm.~\ref{myhillnerodeone}
   398   the other terms contain the empty string. 
       
   399 
       
   400 
       
   401   Our proof of Thm.~\ref{myhillnerodeone}
   397   will be by transforming the equational system into a \emph{solved form}
   402   will be by transforming the equational system into a \emph{solved form}
   398   maintaining the invariants \eqref{inv1} and \eqref{inv2}. From the solved
   403   maintaining the invariants \eqref{inv1} and \eqref{inv2}. From the solved
   399   form we will be able to read off the regular expressions using our 
   404   form we will be able to read off the regular expressions using our 
   400   variant of Arden's Lemma (Lem.~\ref{arden}).
   405   variant of Arden's Lemma (Lem.~\ref{arden}).
   401 
   406