regexp: comparison Journal/Paper.thy

equal deleted inserted replaced

-:50cc1a39c990
+:c6ebfe052109
 \end{tabular}
 \end{center}
 \noindent
-On `paper' or a theorem prover based on set-theory, we can define the corresponding
+On `paper' we can define the corresponding
 graph in terms of the disjoint
 union of the state nodes. Unfortunately in HOL, the standard definition for disjoint
 union, namely
 %
 \begin{equation}\label{disjointunion}
 changes the type---the disjoint union is not a set, but a set of
 pairs. Using this definition for disjoint union means we do not have a
 single type for automata. As a result we will not be able to define a regular
 language as one for which there exists an automaton that recognises all its
 strings. This is because we cannot make a definition in HOL that is polymorphic in
-the state type and also there is no type quantification available in HOL (unlike
+the state type and there is no type quantification available in HOL (unlike
 in Coq, for example).
 An alternative, which provides us with a single type for automata, is to give every
 state node an identity, for example a natural
 number, and then be careful to rename these identities apart whenever
 In this paper, we will not attempt to formalise automata theory in
 Isabelle/HOL nor will we attempt to formalise automata proofs from the
 literature, but take a different approach to regular languages than is
 usually taken. Instead of defining a regular language as one where there
-exists an automaton that recognises all strings of the language, we define a
+exists an automaton that recognises all its strings, we define a
 regular language as:
 \begin{dfntn}
 A language @{text A} is \emph{regular}, provided there is a regular expression that matches all
 strings of @{text "A"}.
 \noindent
 The reason is that regular expressions, unlike graphs, matrices and
 functions, can be easily defined as an inductive datatype. A reasoning
 infrastructure (like induction and recursion) comes then for free in
 HOL. Moreover, no side-conditions will be needed for regular expressions,
-like we usually need for automata. This convenience of regular expressions has
+like we need for automata. This convenience of regular expressions has
 recently been exploited in HOL4 with a formalisation of regular expression
 matching based on derivatives \cite{OwensSlind08} and with an equivalence
 checker for regular expressions in Isabelle/HOL \cite{KraussNipkow11}.  The
 main purpose of this paper is to show that a central result about regular
 languages---the Myhill-Nerode theorem---can be recreated by only using
 an argument about solving equational sytems.  This argument appears to be
 folklore. For the other part, we give two proofs: one direct proof using
 certain tagging-functions, and another indirect proof using Antimirov's
 partial derivatives \cite{Antimirov95}. Again to our best knowledge, the
 tagging-functions have not been used before to establish the Myhill-Nerode
-theorem. Derivatives of regular expressions have been used widely in the
+theorem. Derivatives of regular expressions have been recently used quite
-literature about regular expressions. However, partial derivatives are more
+widely in the literature about regular expressions. However, partial
-suitable in the context of the Myhill-Nerode theorem, since it is easier to
+derivatives are more suitable in the context of the Myhill-Nerode theorem,
-establish formally their finiteness result.
+since it is easier to establish formally their finiteness result.
 *}
 section {* Preliminaries *}
 \noindent
 which means that if we concatenate the character @{text c} to the end of all
 strings in the equivalence class @{text Y}, we obtain a subset of
 @{text X}. Note that we do not define an automaton here, we merely relate two sets
 (with the help of a character). In our concrete example we have
-@{term "X\<^isub>1 \<Turnstile>c\<Rightarrow> X\<^isub>2"}, @{term "X\<^isub>1 \<Turnstile>d\<Rightarrow> X\<^isub>3"} with @{text d} being any
+@{term "X\<^isub>1 \<Turnstile>c\<Rightarrow> X\<^isub>2"}, @{term "X\<^isub>1 \<Turnstile>d\<^isub>i\<Rightarrow> X\<^isub>3"} with @{text "d\<^isub>i"} being any
-other character than @{text c}, and @{term "X\<^isub>3 \<Turnstile>d\<Rightarrow> X\<^isub>3"} for any @{text d}.
+other character than @{text c}, and @{term "X\<^isub>3 \<Turnstile>c\<^isub>j\<Rightarrow> X\<^isub>3"} for any
+caracter @{text "c\<^isub>j"}.
 Next we construct an \emph{initial equational system} that
 contains an equation for each equivalence class. We first give
 an informal description of this construction. Suppose we have
 the equivalence classes @{text "X\<^isub>1,\<dots>,X\<^isub>n"}, there must be one and only one that
 \end{equation}
 \noindent
 where @{text "d\<^isub>1\<dots>d\<^isub>n"} is the sequence of all characters
 except @{text c}, and @{text "c\<^isub>1\<dots>c\<^isub>m"} is the sequence of all
-characters.  In our initial equation systems there can only be finitely many
+characters.
-terms of the form @{text "(Y\<^isub>i\<^isub>j, ATOM c\<^isub>i\<^isub>j)"},
-since by assumption there are only finitely many equivalence classes and
-only finitely many characters.
 Overloading the function @{text \<calL>} for the two kinds of terms in the
 equational system, we have
 \begin{center}
 \noindent
 In this definition, we first delete all terms of the form @{text "(X, r)"} from @{text rhs};
 then we calculate the combined regular expressions for all @{text r} coming
 from the deleted @{text "(X, r)"}, and take the @{const Star} of it;
-finally we append this regular expression to @{text rhs'}. It can be easily seen
+finally we append this regular expression to @{text rhs'}. If we apply this
-that this operation mimics Arden's Lemma on the level of equations. To ensure
+operation to the right-hand side of @{text "X\<^isub>3"} in \eqref{exmpcs}, we obtain
-the non-emptiness condition of Arden's Lemma we say that a right-hand side is
+the equation:
-@{text ardenable} provided
+\begin{center}
+\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
+@{term "X\<^isub>3"} & @{text "="} &
+@{text "(X\<^isub>1, TIMES (ATOM d\<^isub>1) (STAR \<^raw:\ensuremath{\bigplus}>{ATOM c\<^isub>1,\<dots>,ATOM c\<^isub>m})) + \<dots> "}\\
+& & \mbox{}\hspace{13mm}
+@{text "\<dots> + (X\<^isub>1, TIMES (ATOM d\<^isub>n) (STAR \<^raw:\ensuremath{\bigplus}>{ATOM c\<^isub>1,\<dots>,ATOM c\<^isub>m}))"}
+\end{tabular}
+\end{center}
+\noindent
+That means we eliminated the dependency of @{text "X\<^isub>3"} on the
+right-hand side.  Note we used the abbreviation
+@{text "\<^raw:\ensuremath{\bigplus}>{ATOM c\<^isub>1,\<dots>,ATOM c\<^isub>m}"}
+to stand for a regular expression that matches with every character. In
+our algorithm we are only interested in the existence of such a regular expresion
+and not specify it any further.
+It can be easily seen that the @{text "Arden"}-operation mimics Arden's
+Lemma on the level of equations. To ensure the non-emptiness condition of
+Arden's Lemma we say that a right-hand side is @{text ardenable} provided
 \begin{center}
 @{thm ardenable_def}
 \end{center}
 \noindent
 We again delete first all occurrences of @{text "(X, r)"} in @{text rhs}; we then calculate
 the regular expression corresponding to the deleted terms; finally we append this
 regular expression to @{text "xrhs"} and union it up with @{text rhs'}. When we use
 the substitution operation we will arrange it so that @{text "xrhs"} does not contain
-any occurrence of @{text X}.
+any occurrence of @{text X}. For example substituting the first equation in
+\eqref{exmpcs} into the right-hand side of the second, thus eliminating the equivalence
+class @{text "X\<^isub>1"}, gives us the equation
+\begin{equation}\label{exmpresult}
+\mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
+@{term "X\<^isub>2"} & @{text "="} & @{text "\<lambda>(TIMES ONE (ATOM c))"}\\
+\end{tabular}}
+\end{equation}
 With these two operations in place, we can define the operation that removes one equation
 from an equational systems @{text ES}. The operation @{const Subst_all}
 substitutes an equation @{text "X = xrhs"} throughout an equational system @{text ES};
 @{const Remove} then completely removes such an equation from @{text ES} by substituting

changeset 178	c6ebfe052109
parent 177	50cc1a39c990
child 179	edacc141060f