# HG changeset patch # User urbanc # Date 1311677906 0 # Node ID d371536861bc63fdd1e9fb14a6d4d9445eb63642 # Parent 21ee3a852a02d3844cb751f3e749537ef6542f2c more on the introduction of the journal paper diff -r 21ee3a852a02 -r d371536861bc Journal/Paper.thy --- a/Journal/Paper.thy Mon Jul 25 18:00:52 2011 +0000 +++ b/Journal/Paper.thy Tue Jul 26 10:58:26 2011 +0000 @@ -29,16 +29,14 @@ "_Collect p P" <= "{p|xs. P}" "_CollectIn p A P" <= "{p : A. P}" -abbreviation "NULL \ Zero" -abbreviation "EMPTY \ One" -abbreviation "CHAR \ Atom" -abbreviation "ALT \ Plus" -abbreviation "SEQ \ Times" +abbreviation "ZERO \ Zero" +abbreviation "ONE \ One" +abbreviation "ATOM \ Atom" +abbreviation "PLUS \ Plus" +abbreviation "TIMES \ Times" abbreviation "STAR \ Star" -ML {* @{term "op ^^"} *} - notation (latex output) str_eq_rel ("\\<^bsub>_\<^esub>") and str_eq ("_ \\<^bsub>_\<^esub> _") and @@ -132,11 +130,11 @@ (e.g.~\cite{Kozen97, HopcroftUllman69}). It seems natural to exercise theorem provers by formalising the theorems and by verifying formally the algorithms. A popular choice for a theorem prover would be one based on Higher-Order Logic - (HOL), such as HOL4, HOLlight and Isabelle/HOL. For our development we will - use the latter theorem prover. One distinguishing feature of HOL is it's - type system based on Church's Simple Theory of Types \cite{Church40}. The - limitations of this type system are one of the main motivations - behind the work presented in this paper. + (HOL), for example HOL4, HOLlight and Isabelle/HOL. For our development + we will use the latter. One distinguishing feature of HOL is it's + type system, which is based on Church's Simple Theory of Types \cite{Church40}. The + limitations of this type system are one of the underlying motivations for the + work presented in this paper. The typical approach to regular languages is to introduce finite automata and then define everything in terms of them. For @@ -225,12 +223,13 @@ \end{equation} \noindent - changes the type---the disjoint union is not a set, but a set of pairs. - Using this definition for disjoint union means we do not have a single type for automata - and hence will not be able to state certain properties about \emph{all} - automata, since there is no type quantification available in HOL (unlike in Coq, for example). - As a result, we would not be able to define a language being regular - in terms of the existence of an automata. + changes the type---the disjoint union is not a set, but a set of + pairs. Using this definition for disjoint union means we do not have a + single type for automata. As a result we will not be able to define a regular + language as one for which there exists an automaton that recognises all its + strings. Similarly, we cannot state properties about \emph{all} automata, + since there is no type quantification available in HOL (unlike in Coq, for + example). An alternative, which provides us with a single type for automata, is to give every state node an identity, for example a natural @@ -283,24 +282,24 @@ larger formalisations of automata theory are carried out in Nuprl \cite{Constable00} and in Coq \cite{Filliatre97}. - Also one might consider the Myhill-Nerode theorem as well-worn stock - material for a computer scientists, but paper proofs of this theorem contain some - subtle side-conditions which are easily overlooked and which make formal reasoning - painful. For example in Kozen's proof \cite{Kozen97} it is not sufficient to - have just any automata recognising a regular language, but one which does - not have inaccessible states. This means if we define a regular language - for which \emph{any} finite automaton exists, then one has to ensure that - another equivalent can be found satisfying the side-conditions. Similarly - Brozowski mentiones in \cite{Brozowski10} side-conditions of finite automata - in connection of state-complexity: there it is required that automata - must be complete in the sense of having a total transition function. - Furthermore, if a `sink' state is present which accepts no word, then there - must be only one such state. . . . - Such 'little' lemmas make formalisation of these results in a theroem - prover hair-pulling experiences. + One might also consider the Myhill-Nerode theorem as well-worn stock + material where everything is clear. However, paper proofs of this theorem + often involve subtle side-conditions which are easily overlooked, but which + make formal reasoning rather painful. For example Kozen's proof requires + that the automata do not have inaccessible states \cite{Kozen97}. Another + subtle side-condition is completeness of automata: + automata need to have total transition functions and at most one `sink' + state from which there is no connection to a final state (Brozowski mentions + this side-condition in connection with state complexity + \cite{Brozowski10}). Such side-conditions mean that if we define a regular + language as one for which there exists \emph{any} finite automaton, then we + need a lemma which ensures that another equivalent can be found satisfying the + side-condition. Unfortunately, such `little' and `obvious' lemmas make + formalisations of results in automata theory hair-pulling experiences. + In this paper, we will not attempt to formalise automata theory in - Isabelle/HOL nor attempt to formalise any automata proof from the + Isabelle/HOL nor will we attempt to formalise automata proofs from the literature, but take a different approach to regular languages than is usually taken. Instead of defining a regular language as one where there exists an automaton that recognises all strings of the language, we define a @@ -313,9 +312,10 @@ \noindent The reason is that regular expressions, unlike graphs, matrices and - functions, can be easily defined as inductive datatype. Consequently a - corresponding reasoning infrastructure (like induction or recursion) comes - for free. This has recently been exploited in HOL4 with a formalisation of + functions, can be easily defined as an inductive datatype. No side-conditions + will be needed for regular expressions. Moreover, a reasoning infrastructure + (like induction and recursion) comes for free in HOL-based theorem provers. + This has recently been exploited in HOL4 with a formalisation of regular expression matching based on derivatives \cite{OwensSlind08} and with an equivalence checker for regular expressions in Isabelle/HOL \cite{KraussNipkow11}. The purpose of this paper is to show that a central @@ -329,13 +329,13 @@ {\bf Contributions:} There is an extensive literature on regular languages. To our best knowledge, our proof of the Myhill-Nerode theorem is the first that is based on regular expressions, only. The part of this theorem stating - that finitely many partitions of a language w.r.t.~to the Myhill-Nerode - relation imply that the language is regular is proved by a folklore argument - of solving equational sytems. For the other part we give two proofs: a + that finitely many partitions imply regularity of the language is proved by + an argument about solving equational sytems. This argument seems to be folklore. + For the other part, we give two proofs: a direct proof using certain tagging-functions, and an indirect proof using Antimirov's partial derivatives \cite{Antimirov95} (also earlier russion work). Again to our best knowledge, the tagging-functions have not been used before to - establish the Myhill-Nerode theorem. + establish the Myhill-Nerode theorem. *} @@ -432,11 +432,11 @@ \begin{center} @{text r} @{text "::="} - @{term NULL}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} - @{term EMPTY}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} - @{term "CHAR c"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} - @{term "SEQ r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} - @{term "ALT r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} + @{term ZERO}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} + @{term ONE}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} + @{term "ATOM c"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} + @{term "TIMES r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} + @{term "PLUS r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} @{term "STAR r"} \end{center} @@ -466,8 +466,8 @@ a regular expression that matches the union of all languages of @{text rs}. We only need to know the existence of such a regular expression and therefore we use Isabelle/HOL's @{const "fold_graph"} and Hilbert's - @{text "\"} to define @{term "\rs"}. This operation, roughly speaking, folds @{const ALT} over the - set @{text rs} with @{const NULL} for the empty set. We can prove that for a finite set @{text rs} + @{text "\"} to define @{term "\rs"}. This operation, roughly speaking, folds @{const PLUS} over the + set @{text rs} with @{const ZERO} for the empty set. We can prove that for a finite set @{text rs} % \begin{equation}\label{uplus} \mbox{@{thm (lhs) folds_alt_simp} @{text "= \ (\ ` rs)"}} @@ -567,15 +567,15 @@ \begin{center} \begin{tabular}{rcl} - @{text "X\<^isub>1"} & @{text "="} & @{text "(Y\<^isub>1\<^isub>1, CHAR c\<^isub>1\<^isub>1) + \ + (Y\<^isub>1\<^isub>p, CHAR c\<^isub>1\<^isub>p) + \(EMPTY)"} \\ - @{text "X\<^isub>2"} & @{text "="} & @{text "(Y\<^isub>2\<^isub>1, CHAR c\<^isub>2\<^isub>1) + \ + (Y\<^isub>2\<^isub>o, CHAR c\<^isub>2\<^isub>o)"} \\ + @{text "X\<^isub>1"} & @{text "="} & @{text "(Y\<^isub>1\<^isub>1, ATOM c\<^isub>1\<^isub>1) + \ + (Y\<^isub>1\<^isub>p, ATOM c\<^isub>1\<^isub>p) + \(ONE)"} \\ + @{text "X\<^isub>2"} & @{text "="} & @{text "(Y\<^isub>2\<^isub>1, ATOM c\<^isub>2\<^isub>1) + \ + (Y\<^isub>2\<^isub>o, ATOM c\<^isub>2\<^isub>o)"} \\ & $\vdots$ \\ - @{text "X\<^isub>n"} & @{text "="} & @{text "(Y\<^isub>n\<^isub>1, CHAR c\<^isub>n\<^isub>1) + \ + (Y\<^isub>n\<^isub>q, CHAR c\<^isub>n\<^isub>q)"}\\ + @{text "X\<^isub>n"} & @{text "="} & @{text "(Y\<^isub>n\<^isub>1, ATOM c\<^isub>n\<^isub>1) + \ + (Y\<^isub>n\<^isub>q, ATOM c\<^isub>n\<^isub>q)"}\\ \end{tabular} \end{center} \noindent - where the terms @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"} + where the terms @{text "(Y\<^isub>i\<^isub>j, ATOM c\<^isub>i\<^isub>j)"} stand for all transitions @{term "Y\<^isub>i\<^isub>j \c\<^isub>i\<^isub>j\ X\<^isub>i"}. %The intuition behind the equational system is that every @@ -584,10 +584,10 @@ %are the @{text "Y\<^isub>i\<^isub>j"}; the @{text "c\<^isub>i\<^isub>j"} are the labels of the transitions from these %predecessor states to @{text X\<^isub>i}. There can only be - finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"} in a right-hand side + finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, ATOM c\<^isub>i\<^isub>j)"} in a right-hand side since by assumption there are only finitely many equivalence classes and only finitely many characters. - The term @{text "\(EMPTY)"} in the first equation acts as a marker for the initial state, that + The term @{text "\(ONE)"} in the first equation acts as a marker for the initial state, that is the equivalence class containing @{text "[]"}.\footnote{Note that we mark, roughly speaking, the single `initial' state in the equational system, which is different from @@ -598,7 +598,7 @@ be reached by adding a character to the end of @{text Y}. This is also the reason why we have to use our reverse version of Arden's Lemma.} %In our initial equation system there can only be - %finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"} in a right-hand side + %finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, ATOM c\<^isub>i\<^isub>j)"} in a right-hand side %since by assumption there are only finitely many %equivalence classes and only finitely many characters. Overloading the function @{text \} for the two kinds of terms in the @@ -614,14 +614,14 @@ and we can prove for @{text "X\<^isub>2\<^isub>.\<^isub>.\<^isub>n"} that the following equations % \begin{equation}\label{inv1} - @{text "X\<^isub>i = \(Y\<^isub>i\<^isub>1, CHAR c\<^isub>i\<^isub>1) \ \ \ \(Y\<^isub>i\<^isub>q, CHAR c\<^isub>i\<^isub>q)"}. + @{text "X\<^isub>i = \(Y\<^isub>i\<^isub>1, ATOM c\<^isub>i\<^isub>1) \ \ \ \(Y\<^isub>i\<^isub>q, ATOM c\<^isub>i\<^isub>q)"}. \end{equation} \noindent hold. Similarly for @{text "X\<^isub>1"} we can show the following equation % \begin{equation}\label{inv2} - @{text "X\<^isub>1 = \(Y\<^isub>1\<^isub>1, CHAR c\<^isub>1\<^isub>1) \ \ \ \(Y\<^isub>1\<^isub>p, CHAR c\<^isub>1\<^isub>p) \ \(\(EMPTY))"}. + @{text "X\<^isub>1 = \(Y\<^isub>1\<^isub>1, ATOM c\<^isub>1\<^isub>1) \ \ \ \(Y\<^isub>1\<^isub>p, ATOM c\<^isub>1\<^isub>p) \ \(\(ONE))"}. \end{equation} \noindent @@ -641,8 +641,8 @@ \mbox{\begin{tabular}{rcl} @{thm (lhs) Init_rhs_def} & @{text "\"} & @{text "if"}~@{term "[] \ X"}\\ - & & @{text "then"}~@{term "{Trn Y (CHAR c) | Y c. Y \ CS \ Y \c\ X} \ {Lam EMPTY}"}\\ - & & @{text "else"}~@{term "{Trn Y (CHAR c)| Y c. Y \ CS \ Y \c\ X}"}\\ + & & @{text "then"}~@{term "{Trn Y (ATOM c) | Y c. Y \ CS \ Y \c\ X} \ {Lam ONE}"}\\ + & & @{text "else"}~@{term "{Trn Y (ATOM c)| Y c. Y \ CS \ Y \c\ X}"}\\ @{thm (lhs) Init_def} & @{text "\"} & @{thm (rhs) Init_def} \end{tabular}} \end{equation} @@ -986,6 +986,7 @@ section {* Myhill-Nerode, Second Part *} text {* + \noindent We will prove in this section the second part of the Myhill-Nerode theorem. It can be formulated in our setting as follows: @@ -999,7 +1000,7 @@ \begin{proof}[Base Cases] - The cases for @{const NULL}, @{const EMPTY} and @{const CHAR} are routine, because + The cases for @{const ZERO}, @{const ONE} and @{const ATOM} are routine, because we can easily establish that \begin{center} @@ -1093,9 +1094,9 @@ Chaining Lem.~\ref{finone} and \ref{fintwo} together, means in order to show that @{term "UNIV // \(L r)"} is finite, we have to find a tagging-function whose range can be shown to be finite and whose tagging-relation refines @{term "\(L r)"}. - Let us attempt the @{const ALT}-case first. + Let us attempt the @{const PLUS}-case first. - \begin{proof}[@{const "ALT"}-Case] + \begin{proof}[@{const "PLUS"}-Case] We take as tagging-function % \begin{center} @@ -1106,7 +1107,7 @@ where @{text "A"} and @{text "B"} are some arbitrary languages. We can show in general, if @{term "finite (UNIV // \A)"} and @{term "finite (UNIV // \B)"} then @{term "finite ((UNIV // \A) \ (UNIV // \B))"} holds. The range of - @{term "tag_str_ALT A B"} is a subset of this product set---so finite. It remains to be shown + @{term "tag_str_PLUS A B"} is a subset of this product set---so finite. It remains to be shown that @{text "=tag\<^isub>A\<^isub>L\<^isub>T A B="} refines @{term "\(A \ B)"}. This amounts to showing % @@ -1134,7 +1135,7 @@ \noindent The pattern in \eqref{pattern} is repeated for the other two cases. Unfortunately, - they are slightly more complicated. In the @{const SEQ}-case we essentially have + they are slightly more complicated. In the @{const TIMES}-case we essentially have to be able to infer that % \begin{center} @@ -1239,14 +1240,14 @@ with the idea that in the first split we have to make sure that @{text "(x - x') @ z"} is in the language @{text B}. - \begin{proof}[@{const SEQ}-Case] + \begin{proof}[@{const TIMES}-Case] If @{term "finite (UNIV // \A)"} and @{term "finite (UNIV // \B)"} then @{term "finite ((UNIV // \A) \ (Pow (UNIV // \B)))"} holds. The range of - @{term "tag_str_SEQ A B"} is a subset of this product set, and therefore finite. + @{term "tag_str_TIMES A B"} is a subset of this product set, and therefore finite. We have to show injectivity of this tagging-function as % \begin{center} - @{term "\z. tag_str_SEQ A B x = tag_str_SEQ A B y \ x @ z \ A \ B \ y @ z \ A \ B"} + @{term "\z. tag_str_TIMES A B x = tag_str_TIMES A B y \ x @ z \ A \ B \ y @ z \ A \ B"} \end{center} % \noindent @@ -1259,7 +1260,7 @@ \end{center} % \noindent - and by the assumption about @{term "tag_str_SEQ A B"} also + and by the assumption about @{term "tag_str_TIMES A B"} also % \begin{center} @{term "(\B `` {x - x'}) \ ({\B `` {y - y'} |y'. y' \ y \ y' \ A})"} @@ -1274,15 +1275,15 @@ @{term "y @ z \ A \ B"}, as needed in this case. Second, there exists a @{text "z'"} such that @{term "x @ z' \ A"} and @{text "z - z' \ B"}. - By the assumption about @{term "tag_str_SEQ A B"} we have + By the assumption about @{term "tag_str_TIMES A B"} we have @{term "\A `` {x} = \A `` {y}"} and thus @{term "x \A y"}. Which means by the Myhill-Nerode relation that @{term "y @ z' \ A"} holds. Using @{text "z - z' \ B"}, we can conclude also in this case - with @{term "y @ z \ A \ B"}. We again can complete the @{const SEQ}-case + with @{term "y @ z \ A \ B"}. We again can complete the @{const TIMES}-case by setting @{text A} to @{term "L r\<^isub>1"} and @{text B} to @{term "L r\<^isub>2"}.\qed \end{proof} \noindent - The case for @{const STAR} is similar to @{const SEQ}, but poses a few extra challenges. When + The case for @{const STAR} is similar to @{const TIMES}, but poses a few extra challenges. When we analyse the case that @{text "x @ z"} is an element in @{term "A\"} and @{text x} is not the empty string, we have the following picture: @@ -1390,8 +1391,14 @@ section {* Second Part based on Partial Derivatives *} text {* - We briefly considered using the method Brzozowski presented in the - Appendix of~\cite{Brzozowski64} in order to prove the second + \noindent + As we have seen in the previous section, in order to establish + the second direction of the Myhill-Nerode theorem, we need to find + a more refined relation (more refined than ??) for which we can + show that there are only finitely many equivalence classes. + Brzozowski presented in the Appendix of~\cite{Brzozowski64} + + in order to prove the second direction of the Myhill-Nerode theorem. There he calculates the derivatives for regular expressions and shows that for every language there can be only finitely many of them %derivations (if @@ -1404,6 +1411,9 @@ of derivatives modulo ACI. Therefore we preferred our direct method of using tagging-functions. + The problem of finiteness modulo ACI can be avoided by using partial + derivatives introduced by Antimirov \cite{Antimirov}. + *} section {* Closure Properties *} @@ -1501,11 +1511,12 @@ establishing the Myhill-Nerode theorem. All standard proofs of this direction proceed by arguments over automata.\medskip - We expect that the development of Krauss \& Nipkoe gets easier by - using partial derivatives. + We expect that the development of Krauss \& Nipkow gets easier by + using partial derivatives.\medskip \noindent - {\bf Acknowledgements:} We are grateful for the comments we received from Larry + {\bf Acknowledgements:} + We are grateful for the comments we received from Larry Paulson. *}