regexp: comparison Journal/Paper.thy

equal deleted inserted replaced

-:92ca56c1a199
+:288637d9dcde
 (Definition~\ref{baddef}). This is because we cannot make a
 definition in HOL that is only polymorphic in the state type, but
 not in the predicate for regularity; and there is no type
 quantification available in HOL.\footnote{Slind already pointed out
 this problem in an email to the HOL4 mailing list on 21st April
-2005.} Coq, for example, has quantification over types and thus can
+2005.} Systems such as Coq permit quantification over types and thus can
-state such a definition.  This has been recently exploited in a
+state such a definition.  This has been recently exploited in an
-slick formalisation of the Myhill-Nerode theorem in Coq by
+elegant and short formalisation of the Myhill-Nerode theorem in Coq by
-\citeN{XXX}.
+\citeN{Smolka13}, but as said this is not avaible to us working in Isabelle/HOL.
 An alternative, which provides us with a single type for states of automata,
 is to give every state node an identity, for example a natural number, and
 then be careful to rename these identities apart whenever connecting two
 automata. This results in clunky proofs establishing that properties are
 \begin{quote}
 ``{\it \ldots [b]ut is there a rigorous way to even state that?}''
 \end{quote}
+\noindent
+We shall give an answer to these questions with our paper.
 %Moreover, no side-conditions will be needed for regular expressions,
 %like we need for automata.
 The convenience of regular expressions has
 finitely many terms of the form @{text "(Y\<^sub>i\<^sub>j, ATOM c\<^sub>i\<^sub>j)"} in a right-hand side
 since by assumption there are only finitely many
 equivalence classes and only finitely many characters.
 The term @{text "\<lambda>(ONE)"} in the first equation acts as a marker for the initial state, that
 is the equivalence class
-containing the empty string @{text "[]"}.\footnote{Note that we mark, roughly speaking, the
+containing the empty string @{text "[]"}.
-single `initial' state in the equational system, which is different from
-the method by \citeN{Brzozowski64}, where he marks the
-`terminal' states. We are forced to set up the equational system in our
-way, because the Myhill-Nerode Relation determines the `direction' of the
-transitions---the successor `state' of an equivalence class @{text Y} can
-be reached by adding a character to the end of @{text Y}. This is also the
-reason why we have to use our reversed version of Arden's Lemma.}
 In our running example we have the initial equational system
 %
 \begin{equation}\label{exmpcs}
 \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
 @{term "X\<^sub>1"} & @{text "="} & @{text "\<lambda>(ONE)"}\\
 @{term "X\<^sub>4"} & @{text "="} & @{text "(X\<^sub>1, ATOM b) + (X\<^sub>2, ATOM a) + (X\<^sub>3, ATOM a)"}\\
 & & \mbox{}\hspace{0mm}@{text "+ (X\<^sub>3, ATOM b) + (X\<^sub>4, ATOM a) + (X\<^sub>4, ATOM b)"}\\
 \end{tabular}}
 \end{equation}
-\noindent
+Note that we mark, roughly speaking, the
+single `initial' state in the equational system, which is different from
+the method by \citeN{Brzozowski64}, where he marks the
+`terminal' states. We are forced to set up the equational system in our
+way, because the Myhill-Nerode Relation determines the `direction' of the
+transitions---the successor `state' of an equivalence class @{text Y} can
+be reached by adding a character to the end of @{text Y}. If we had defined
+our equations the `Brzozowski-way', then our variables do not correspond to
+the equivalence classes generated by the Myhill-Nerode relation. This need of reverse marking is also the
+reason why we have to use our reversed version of Arden's Lemma.
 Overloading the function @{text \<calL>} for the two kinds of terms in the
 equational system, we have
 \begin{center}
 @{text "\<calL>(Y, r) \<equiv>"} %
 \begin{equation}\label{inv2}
 @{text "X\<^sub>1 = \<calL>(Y\<^sub>1\<^sub>1, ATOM c\<^sub>1\<^sub>1) \<union> \<dots> \<union> \<calL>(Y\<^sub>1\<^sub>p, ATOM c\<^sub>1\<^sub>p) \<union> \<calL>(\<lambda>(ONE))"}.
 \end{equation}
 \noindent
-holds. The reason for adding the @{text \<lambda>}-marker to our initial equational system is
+holds. Again note that the reason for adding the @{text \<lambda>}-marker to our initial equational system is
 to obtain this equation: it only holds with the marker, since none of
-the other terms contain the empty string. The point of the initial equational system is
+the other terms contain the empty string.
-that solving it means we will be able to extract a regular expression for every equivalence class.
+The point of the initial equational system is
+that solving it means we will be able to extract a regular expression for every equivalence class
+generated by the Myhill-Nerode relation.
 Our representation for the equations in Isabelle/HOL are pairs,
 where the first component is an equivalence class (a set of strings)
 and the second component
 is a set of terms. Given a set of equivalence
 \end{equation}
 \noindent
 Carrying this idea through, we must not consider the set of all derivatives,
 but the one modulo @{text "ACI"}.  In principle, this can be done formally,
-but it is very painful in a theorem prover (since there is no
+but it is very painful in a theorem prover---since there is no
-direct characterisation of the set of dissimilar derivatives).
+direct characterisation of the set of dissimilar derivatives. Therefore
+\citeN{CoquandSiles12} who follow through this idea had to `stand on their  heads' and
+first formalise the quite intricate notion of \emph{inductively finite sets} in
+order to formalise this property.
 Fortunately, there is a much simpler approach using \emph{partial
 derivatives}. They were introduced by \citeN{Antimirov95} and can be defined
 in Isabelle/HOL as follows:
 our formalisation \cite{myhillnerodeafp11} can be found in the Archive of Formal Proofs at
 \mbox{\url{http://afp.sourceforge.net/entries/Myhill-Nerode.shtml}}.
 Our future work will focus on formalising the regular expression matchers
 developed by \citeN{Sulzmann12} which generate variable assignments for
-regular expression submatching.\smallskip
+regular expression sub-matching. For this they use both, derivatives and
+partial derivatives of regular expressions.\smallskip
 \noindent
 {\bf Acknowledgements:}
 We are grateful for the comments we received from Larry Paulson.  Tobias
 Nipkow made us aware of the properties in Theorem~\ref{subseqreg} and Tjark

changeset 387	288637d9dcde
parent 386	92ca56c1a199
child 388	0da31edd95b9