diff -r 2335fcb96052 -r d63baacbdb16 Paper/Paper.thy --- a/Paper/Paper.thy Mon Feb 07 13:17:01 2011 +0000 +++ b/Paper/Paper.thy Mon Feb 07 20:30:10 2011 +0000 @@ -14,6 +14,7 @@ notation (latex output) str_eq_rel ("\\<^bsub>_\<^esub>") and + str_eq ("_ \\<^bsub>_\<^esub> _") and Seq (infixr "\" 100) and Star ("_\<^bsup>\\<^esup>") and pow ("_\<^bsup>_\<^esup>" [100, 100] 100) and @@ -21,9 +22,11 @@ quotient ("_ \<^raw:\ensuremath{\!\sslash\!}> _" [90, 90] 90) and REL ("\") and UPLUS ("_ \<^raw:\ensuremath{\uplus}> _" [90, 90] 90) and - L ("L '(_')" [0] 101) and + L ("L'(_')" [0] 101) and + Lam ("\'(_')" [100] 100) and + Trn ("_, _" [100, 100] 100) and EClass ("\_\\<^bsub>_\<^esub>" [100, 100] 100) and - transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longrightarrow}}> _" [100, 100, 100] 100) + transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longmapsto}}> _" [100, 100, 100] 100) (*>*) @@ -213,12 +216,12 @@ Central to our proof will be the solution of equational systems - involving regular expressions. For this we will use Arden's lemma \cite{} + involving regular expressions. For this we will use Arden's lemma \cite{Brzozowski64} which solves equations of the form @{term "X = A ;; X \ B"} provided @{term "[] \ A"}. However we will need the following ``reverse'' version of Arden's lemma. - \begin{lemma}[Reverse Arden's Lemma]\mbox{}\\ + \begin{lemma}[Reverse Arden's Lemma]\label{arden}\mbox{}\\ If @{thm (prem 1) ardens_revised} then @{thm (lhs) ardens_revised} has the unique solution @{thm (rhs) ardens_revised}. @@ -226,7 +229,7 @@ \begin{proof} For the right-to-left direction we assume @{thm (rhs) ardens_revised} and show - that @{thm (lhs) ardens_revised} holds. From Prop.~\ref{langprops}$(i)$ + that @{thm (lhs) ardens_revised} holds. From Prop.~\ref{langprops}@{text "(i)"} we have @{term "A\ = {[]} \ A ;; A\"}, which is equal to @{term "A\ = {[]} \ A\ ;; A"}. Adding @{text B} to both sides gives @{term "B ;; A\ = B ;; ({[]} \ A\ ;; A)"}, whose right-hand side @@ -245,12 +248,12 @@ of @{text "\"}. For the inclusion in the other direction we assume a string @{text s} with length @{text k} is element in @{text X}. Since @{thm (prem 1) ardens_revised} - we know by Prop.~\ref{langprops}$(ii)$ that + we know by Prop.~\ref{langprops}@{text "(ii)"} that @{term "s \ X ;; (A \ Suc k)"} since its length is only @{text k} (the strings in @{term "X ;; (A \ Suc k)"} are all longer). From @{text "(*)"} it follows then that @{term s} must be element in @{term "(\m\{0..k}. B ;; (A \ m))"}. This in turn - implies that @{term s} is in @{term "(\n. B ;; (A \ n))"}. Using Prop.~\ref{langprops}$(iii)$ + implies that @{term s} is in @{term "(\n. B ;; (A \ n))"}. Using Prop.~\ref{langprops}@{text "(iii)"} this is equal to @{term "B ;; A\"}, as we needed to show.\qed \end{proof} @@ -294,36 +297,108 @@ section {* Finite Partitions Imply Regularity of a Language *} text {* + The central definition in the Myhill-Nerode theorem is the + \emph{Myhill-Nerode relation}, which states that w.r.t.~a language two + strings are related, provided there is no distinguishing extension in this + language. This can be defined as: + \begin{definition}[Myhill-Nerode Relation]\mbox{}\\ - @{thm str_eq_rel_def[simplified]} + @{thm str_eq_def[simplified str_eq_rel_def Pair_Collect]} \end{definition} \noindent - It is easy to see that @{term "\A"} is an equivalence relation, which partitions - the set of all string, @{text "UNIV"}, into a set of equivalence classed. We define - the set @{term "finals A"} as those equivalence classes that contain strings of - @{text A}, namely + It is easy to see that @{term "\A"} is an equivalence relation, which + partitions the set of all strings, @{text "UNIV"}, into a set of disjoint + equivalence classes. One direction of the Myhill-Nerode theorem establishes + that if there are finitely many equivalence classes, then the language is + regular. In our setting we have therefore + + \begin{theorem}\label{myhillnerodeone} + @{thm[mode=IfThen] hard_direction} + \end{theorem} + \noindent + To prove this theorem, we define the set @{term "finals A"} as those equivalence + classes that contain strings of @{text A}, namely + % \begin{equation} @{thm finals_def} \end{equation} \noindent - It is easy to show that @{thm lang_is_union_of_finals} holds. We can also define - a notion of \emph{transition} between equivalence classes as + It is straightforward to show that @{thm lang_is_union_of_finals} holds. + Therefore if we know that there exists a regular expression for every + equivalence class in @{term "finals A"} (which by assumption must be + a finite set), then we can just combine these regular expressions with @{const ALT} + and obtain a regular expression that matches every string in @{text A}. + + We prove Thm.~\ref{myhillnerodeone} by calculating a regular expression for + \emph{all} equivalence classes, not just the ones in @{term "finals A"}. We + first define a notion of \emph{transition} between equivalence classes + % \begin{equation} @{thm transition_def} \end{equation} \noindent - which means if we add the character @{text c} to all strings in the equivalence - class @{text Y} HERE + which means that if we concatenate all strings matching the regular expression @{text r} + to the end of all strings in the equivalence class @{text Y}, we obtain a subset of + @{text X}. Note that we do not define any automaton here, we merely relate two sets + w.r.t.~a regular expression. + + Next we build an equational system that + contains an equation for each equivalence class. Suppose we have + the equivalence classes @{text "X\<^isub>1,\,X\<^isub>n"}, there must be one and only one that + contains the empty string @{text "[]"} (since equivalence classes are disjoint). + Let us assume @{text "[] \ X\<^isub>1"}. We can build the following equational system + + \begin{center} + \begin{tabular}{rcl} + @{text "X\<^isub>1"} & @{text "="} & @{text "(Y\<^isub>1\<^isub>1, CHAR c\<^isub>1\<^isub>1) + \ + (Y\<^isub>1\<^isub>p, CHAR c\<^isub>1\<^isub>p) + \(EMPTY)"} \\ + @{text "X\<^isub>2"} & @{text "="} & @{text "(Y\<^isub>2\<^isub>1, CHAR c\<^isub>2\<^isub>1) + \ + (Y\<^isub>2\<^isub>o, CHAR c\<^isub>2\<^isub>o)"} \\ + & $\vdots$ \\ + @{text "X\<^isub>n"} & @{text "="} & @{text "(Y\<^isub>n\<^isub>1, CHAR c\<^isub>n\<^isub>1) + \ + (Y\<^isub>n\<^isub>q, CHAR c\<^isub>n\<^isub>q)"}\\ + \end{tabular} + \end{center} - \begin{theorem} - Given a language @{text A}. - @{thm[mode=IfThen] hard_direction} - \end{theorem} + \noindent + where the pairs @{text "(Y\<^isub>i\<^isub>j, r\<^isub>i\<^isub>j)"} stand for all transitions + @{term "Y\<^isub>i\<^isub>j \r\<^isub>i\<^isub>j\ X\<^isub>i"}. The term @{text "\(EMPTY)"} acts as a marker for the equivalence + class containing @{text "[]"}. (Note that we mark, roughly speaking, the + single ``initial'' state in the equational system, which is different from + the method by Brzozowski \cite{Brzozowski64}, which marks the ``terminal'' + states.) Overloading the function @{text L} for the two kinds of terms in the + equational system as follows + + \begin{center} + @{thm L_rhs_e.simps(1)[where r="r", THEN eq_reflection]}\hspace{20mm} + @{thm L_rhs_e.simps(2)[where X="X" and r="r", THEN eq_reflection]} + \end{center} + + \noindent + we can prove for @{text "X\<^isub>2\<^isub>.\<^isub>.\<^isub>n"} that the following equations + % + \begin{equation}\label{inv1} + @{text "X\<^isub>i = L(Y\<^isub>i\<^isub>1, CHAR c\<^isub>i\<^isub>1) \ \ \ L(Y\<^isub>i\<^isub>q, CHAR c\<^isub>i\<^isub>q)"}. + \end{equation} + + \noindent + hold. Similarly for @{text "X\<^isub>1"} we can show the following equation + % + \begin{equation}\label{inv2} + @{text "X\<^isub>1 = L(Y\<^isub>i\<^isub>1, CHAR c\<^isub>i\<^isub>1) \ \ \ L(Y\<^isub>i\<^isub>p, CHAR c\<^isub>i\<^isub>p) \ L(\(EMPTY))"}. + \end{equation} + + \noindent + hold. The reason for adding the @{text \}-marker to our equational system is + to obtain this equations, which only holds in this form since none of + the other terms contain the empty string. Our proof of Thm.~\ref{myhillnerodeone} + will be by transforming the equational system into a \emph{solved form} + maintaining the invariants \eqref{inv1} and \eqref{inv2}. From the solved + form we will be able to read off the regular expressions using our + variant of Arden's Lemma (Lem.~\ref{arden}). + *} section {* Regular Expressions Generate Finitely Many Partitions *}