regexp: comparison Paper/Paper.thy

equal deleted inserted replaced

-:e7e4e490326b
+:7c68b9ad4486
 the accepting and non-accepting states in the corresponding automaton to
 obtain an automaton for the complement language.  The problem, however, lies with
 formalising such reasoning in a HOL-based theorem prover, in our case
 Isabelle/HOL. Automata are built up from states and transitions that
 need to be represented as graphs, matrices or functions, none
-of which can be defined as inductive datatype.
+of which can be defined as an inductive datatype.
 In case of graphs and matrices, this means we have to build our own
 reasoning infrastructure for them, as neither Isabelle/HOL nor HOL4 nor
 HOLlight support them with libraries. Even worse, reasoning about graphs and
 matrices can be a real hassle in HOL-based theorem provers.  Consider for
 \noindent
 changes the type---the disjoint union is not a set, but a set of pairs.
 Using this definition for disjoint union means we do not have a single type for automata
 and hence will not be able to state certain properties about \emph{all}
-automata, since there is no type quantification available in HOL. An
+automata, since there is no type quantification available in HOL (unlike in Coq, for example). An
 alternative, which provides us with a single type for automata, is to give every
 state node an identity, for example a natural
 number, and then be careful to rename these identities apart whenever
 connecting two automata. This results in clunky proofs
 establishing that properties are invariant under renaming. Similarly,
 text {*
 The key definition in the Myhill-Nerode theorem is the
 \emph{Myhill-Nerode relation}, which states that w.r.t.~a language two
 strings are related, provided there is no distinguishing extension in this
-language. This can be defined as tertiary relation.
+language. This can be defined as a tertiary relation.
 \begin{definition}[Myhill-Nerode Relation] Given a language @{text A}, two strings @{text x} and
 @{text y} are Myhill-Nerode related provided
 \begin{center}
 @{thm str_eq_def[simplified str_eq_rel_def Pair_Collect]}
 then we calculate the combined regular expressions for all @{text r} coming
 from the deleted @{text "(X, r)"}, and take the @{const STAR} of it;
 finally we append this regular expression to @{text rhs'}. It can be easily seen
 that this operation mimics Arden's lemma on the level of equations. To ensure
 the non-emptiness condition of Arden's lemma we say that a right-hand side is
-\emph{ardenable} provided
+@{text ardenable} provided
 \begin{center}
 @{thm ardenable_def}
 \end{center}
 \end{center}
 \noindent
 Finally, we can define how an equational system should be solved. For this
 we will need to iterate the process of eliminating equations until only one equation
-will be left in the system. However, we not just want to have any equation
+will be left in the system. However, we do not just want to have any equation
 as being the last one, but the one involving the equivalence class for
 which we want to calculate the regular
 expression. Let us suppose this equivalence class is @{text X}.
 Since @{text X} is the one to be solved, in every iteration step we have to pick an
 equation to be eliminated that is different from @{text X}. In this way
 This principle states that given an invariant (which we will specify below)
 we can prove a property
 @{text "P"} involving @{const Solve}. For this we have to discharge the following
 proof obligations: first the
 initial equational system satisfies the invariant; second the iteration
-step @{text "Iter"} preserves the the invariant as long as the condition @{term Cond} holds;
+step @{text "Iter"} preserves the invariant as long as the condition @{term Cond} holds;
 third @{text "Iter"} decreases the termination order, and fourth that
 once the condition does not hold anymore then the property @{text P} must hold.
 The property @{term P} in our proof will state that @{term "Solve X (Init (UNIV // \<approx>A))"}
 returns with a single equation @{text "X = xrhs"} for some @{text "xrhs"}, and
 \noindent
 The first two ensure that the equational system is always finite (number of equations
 and number of terms in each equation); the second makes sure the `meaning' of the
 equations is preserved under our transformations. The other properties are a bit more
 technical, but are needed to get our proof through. Distinctness states that every
-equation in the system is distinct. Ardenable ensures that we can always
+equation in the system is distinct. @{text Ardenable} ensures that we can always
 apply the arden operation.
 The last property states that every @{text rhs} can only contain equivalence classes
 for which there is an equation. Therefore @{text lhss} is just the set containing
 the first components of an equational system,
 while @{text "rhss"} collects all equivalence classes @{text X} in the terms of the
 \end{lemma}
 \begin{proof}
 Finiteness is given by the assumption and the way how we set up the
 initial equational system. Soundness is proved in Lem.~\ref{inv}. Distinctness
-follows from the fact that the equivalence classes are disjoint. The ardenable
+follows from the fact that the equivalence classes are disjoint. The @{text ardenable}
 property also follows from the setup of the initial equational system, as does
 validity.\qed
 \end{proof}
 \noindent
 \noindent
 Finiteness is straightforward, as @{const Subst} and @{const Arden} operations
 keep the equational system finite. These operations also preserve soundness
 and distinctness (we proved soundness for @{const Arden} in Lem.~\ref{ardenable}).
-The property ardenable is clearly preserved because the append-operation
+The property @{text ardenable} is clearly preserved because the append-operation
 cannot make a regular expression to match the empty string. Validity is
 given because @{const Arden} removes an equivalence class from @{text yrhs}
 and then @{const Subst_all} removes @{text Y} from the equational system.
 Having proved the implication above, we can instantiate @{text "ES"} with @{text "ES - {(Y, yrhs)}"}
 which matches with our proof-obligation of @{const "Subst_all"}. Since
 this is equal to \mbox{@{text "\<Union>\<calL> ` (Arden X rhs)"}} using the properties of the
 invariant and Lem.~\ref{ardenable}. Using the validity property for the equation @{text "X = rhs"},
 we can infer that @{term "rhss rhs \<subseteq> {X}"} and because the arden operation
 removes that @{text X} from @{text rhs}, that @{term "rhss (Arden X rhs) = {}"}.
 This means the right-hand side @{term "Arden X rhs"} can only consist of terms of the form @{term "Lam r"}.
-So we can collect those (finitely many) regular expressions and have @{term "X = L (\<Uplus>rs)"}.
+So we can collect those (finitely many) regular expressions @{text rs} and have @{term "X = L (\<Uplus>rs)"}.
 With this we can conclude the proof.\qed
 \end{proof}
 \noindent
 Lem.~\ref{every_eqcl_has_reg} allows us to finally give a proof for the first direction
 \noindent
 hold, which shows that @{term "UNIV // \<approx>(L r)"} must be finite.\qed
 \end{proof}
 \noindent
-Much more interesting, however, are the inductive cases. They seem hard to be solved
+Much more interesting, however, are the inductive cases. They seem hard to solve
 directly. The reader is invited to try.
 Our proof will rely on some
 \emph{tagging-functions} defined over strings. Given the inductive hypothesis, it will
 be easy to prove that the \emph{range} of these tagging-functions is finite
 text {*
 In this paper we took the view that a regular language is one where there
 exists a regular expression that matches all of its strings. Regular
 expressions can conveniently be defined as a datatype in HOL-based theorem
 provers. For us it was therefore interesting to find out how far we can push
-this point of view. We have established both directions of the Myhill-Nerode
+this point of view. We have established in Isabelle/HOL both directions
-theorem.
+of the Myhill-Nerode theorem.
 %
 \begin{theorem}[The Myhill-Nerode Theorem]\mbox{}\\
 A language @{text A} is regular if and only if @{thm (rhs) Myhill_Nerode}.
 \end{theorem}
 %
 lemma.
 We briefly considered using the method Brzozowski presented in the Appendix
 of~\cite{Brzozowski64} in order to prove the second direction of the
 Myhill-Nerode theorem. There he calculates the derivatives for regular
-expressions and shows that there can be only finitely many of them. We could
+expressions and shows that there can be only finitely many of them (if regarded equal
+modulo ACI). We could
 have used as the tag of a string @{text s} the derivative of a regular expression
 generated with respect to @{text s}.  Using the fact that two strings are
 Myhill-Nerode related whenever their derivative is the same, together with
 the fact that there are only finitely many derivatives for a regular
 expression would give us a similar argument as ours. However it seems not so easy to

changeset 154	7c68b9ad4486
parent 149	e122cb146ecc
child 156	fd39492b187c