Journal/Paper.thy
changeset 387 288637d9dcde
parent 386 92ca56c1a199
child 388 0da31edd95b9
equal deleted inserted replaced
386:92ca56c1a199 387:288637d9dcde
   339   (Definition~\ref{baddef}). This is because we cannot make a
   339   (Definition~\ref{baddef}). This is because we cannot make a
   340   definition in HOL that is only polymorphic in the state type, but
   340   definition in HOL that is only polymorphic in the state type, but
   341   not in the predicate for regularity; and there is no type
   341   not in the predicate for regularity; and there is no type
   342   quantification available in HOL.\footnote{Slind already pointed out
   342   quantification available in HOL.\footnote{Slind already pointed out
   343   this problem in an email to the HOL4 mailing list on 21st April
   343   this problem in an email to the HOL4 mailing list on 21st April
   344   2005.} Coq, for example, has quantification over types and thus can
   344   2005.} Systems such as Coq permit quantification over types and thus can
   345   state such a definition.  This has been recently exploited in a
   345   state such a definition.  This has been recently exploited in an
   346   slick formalisation of the Myhill-Nerode theorem in Coq by
   346   elegant and short formalisation of the Myhill-Nerode theorem in Coq by
   347   \citeN{XXX}.
   347   \citeN{Smolka13}, but as said this is not avaible to us working in Isabelle/HOL.
   348 
   348 
   349   An alternative, which provides us with a single type for states of automata,
   349   An alternative, which provides us with a single type for states of automata,
   350   is to give every state node an identity, for example a natural number, and
   350   is to give every state node an identity, for example a natural number, and
   351   then be careful to rename these identities apart whenever connecting two
   351   then be careful to rename these identities apart whenever connecting two
   352   automata. This results in clunky proofs establishing that properties are
   352   automata. This results in clunky proofs establishing that properties are
   468 
   468 
   469   \begin{quote}
   469   \begin{quote}
   470   ``{\it \ldots [b]ut is there a rigorous way to even state that?}'' 
   470   ``{\it \ldots [b]ut is there a rigorous way to even state that?}'' 
   471   \end{quote} 
   471   \end{quote} 
   472 
   472 
       
   473   \noindent
       
   474   We shall give an answer to these questions with our paper.
       
   475 
   473   %Moreover, no side-conditions will be needed for regular expressions,
   476   %Moreover, no side-conditions will be needed for regular expressions,
   474   %like we need for automata. 
   477   %like we need for automata. 
   475  
   478  
   476 
   479 
   477   The convenience of regular expressions has
   480   The convenience of regular expressions has
   771   finitely many terms of the form @{text "(Y\<^sub>i\<^sub>j, ATOM c\<^sub>i\<^sub>j)"} in a right-hand side 
   774   finitely many terms of the form @{text "(Y\<^sub>i\<^sub>j, ATOM c\<^sub>i\<^sub>j)"} in a right-hand side 
   772   since by assumption there are only finitely many
   775   since by assumption there are only finitely many
   773   equivalence classes and only finitely many characters.
   776   equivalence classes and only finitely many characters.
   774   The term @{text "\<lambda>(ONE)"} in the first equation acts as a marker for the initial state, that
   777   The term @{text "\<lambda>(ONE)"} in the first equation acts as a marker for the initial state, that
   775   is the equivalence class
   778   is the equivalence class
   776   containing the empty string @{text "[]"}.\footnote{Note that we mark, roughly speaking, the
   779   containing the empty string @{text "[]"}.
   777   single `initial' state in the equational system, which is different from
       
   778   the method by \citeN{Brzozowski64}, where he marks the
       
   779   `terminal' states. We are forced to set up the equational system in our
       
   780   way, because the Myhill-Nerode Relation determines the `direction' of the
       
   781   transitions---the successor `state' of an equivalence class @{text Y} can
       
   782   be reached by adding a character to the end of @{text Y}. This is also the
       
   783   reason why we have to use our reversed version of Arden's Lemma.}
       
   784   In our running example we have the initial equational system
   780   In our running example we have the initial equational system
   785   %
   781   %
   786   \begin{equation}\label{exmpcs}
   782   \begin{equation}\label{exmpcs}
   787   \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
   783   \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
   788   @{term "X\<^sub>1"} & @{text "="} & @{text "\<lambda>(ONE)"}\\
   784   @{term "X\<^sub>1"} & @{text "="} & @{text "\<lambda>(ONE)"}\\
   791   @{term "X\<^sub>4"} & @{text "="} & @{text "(X\<^sub>1, ATOM b) + (X\<^sub>2, ATOM a) + (X\<^sub>3, ATOM a)"}\\
   787   @{term "X\<^sub>4"} & @{text "="} & @{text "(X\<^sub>1, ATOM b) + (X\<^sub>2, ATOM a) + (X\<^sub>3, ATOM a)"}\\
   792           & & \mbox{}\hspace{0mm}@{text "+ (X\<^sub>3, ATOM b) + (X\<^sub>4, ATOM a) + (X\<^sub>4, ATOM b)"}\\
   788           & & \mbox{}\hspace{0mm}@{text "+ (X\<^sub>3, ATOM b) + (X\<^sub>4, ATOM a) + (X\<^sub>4, ATOM b)"}\\
   793   \end{tabular}}
   789   \end{tabular}}
   794   \end{equation}
   790   \end{equation}
   795 
   791 
   796   \noindent
   792   Note that we mark, roughly speaking, the
       
   793   single `initial' state in the equational system, which is different from
       
   794   the method by \citeN{Brzozowski64}, where he marks the
       
   795   `terminal' states. We are forced to set up the equational system in our
       
   796   way, because the Myhill-Nerode Relation determines the `direction' of the
       
   797   transitions---the successor `state' of an equivalence class @{text Y} can
       
   798   be reached by adding a character to the end of @{text Y}. If we had defined
       
   799   our equations the `Brzozowski-way', then our variables do not correspond to 
       
   800   the equivalence classes generated by the Myhill-Nerode relation. This need of reverse marking is also the
       
   801   reason why we have to use our reversed version of Arden's Lemma.
       
   802 
   797   Overloading the function @{text \<calL>} for the two kinds of terms in the
   803   Overloading the function @{text \<calL>} for the two kinds of terms in the
   798   equational system, we have
   804   equational system, we have
   799   
   805   
   800   \begin{center}
   806   \begin{center}
   801   @{text "\<calL>(Y, r) \<equiv>"} %
   807   @{text "\<calL>(Y, r) \<equiv>"} %
   816   \begin{equation}\label{inv2}
   822   \begin{equation}\label{inv2}
   817   @{text "X\<^sub>1 = \<calL>(Y\<^sub>1\<^sub>1, ATOM c\<^sub>1\<^sub>1) \<union> \<dots> \<union> \<calL>(Y\<^sub>1\<^sub>p, ATOM c\<^sub>1\<^sub>p) \<union> \<calL>(\<lambda>(ONE))"}.
   823   @{text "X\<^sub>1 = \<calL>(Y\<^sub>1\<^sub>1, ATOM c\<^sub>1\<^sub>1) \<union> \<dots> \<union> \<calL>(Y\<^sub>1\<^sub>p, ATOM c\<^sub>1\<^sub>p) \<union> \<calL>(\<lambda>(ONE))"}.
   818   \end{equation}
   824   \end{equation}
   819 
   825 
   820   \noindent
   826   \noindent
   821   holds. The reason for adding the @{text \<lambda>}-marker to our initial equational system is 
   827   holds. Again note that the reason for adding the @{text \<lambda>}-marker to our initial equational system is 
   822   to obtain this equation: it only holds with the marker, since none of 
   828   to obtain this equation: it only holds with the marker, since none of 
   823   the other terms contain the empty string. The point of the initial equational system is
   829   the other terms contain the empty string. 
   824   that solving it means we will be able to extract a regular expression for every equivalence class. 
   830   The point of the initial equational system is
       
   831   that solving it means we will be able to extract a regular expression for every equivalence class
       
   832   generated by the Myhill-Nerode relation. 
   825 
   833 
   826   Our representation for the equations in Isabelle/HOL are pairs,
   834   Our representation for the equations in Isabelle/HOL are pairs,
   827   where the first component is an equivalence class (a set of strings)
   835   where the first component is an equivalence class (a set of strings)
   828   and the second component
   836   and the second component
   829   is a set of terms. Given a set of equivalence
   837   is a set of terms. Given a set of equivalence
  1956   \end{equation}
  1964   \end{equation}
  1957 
  1965 
  1958   \noindent
  1966   \noindent
  1959   Carrying this idea through, we must not consider the set of all derivatives,
  1967   Carrying this idea through, we must not consider the set of all derivatives,
  1960   but the one modulo @{text "ACI"}.  In principle, this can be done formally, 
  1968   but the one modulo @{text "ACI"}.  In principle, this can be done formally, 
  1961   but it is very painful in a theorem prover (since there is no
  1969   but it is very painful in a theorem prover---since there is no
  1962   direct characterisation of the set of dissimilar derivatives).
  1970   direct characterisation of the set of dissimilar derivatives. Therefore
       
  1971   \citeN{CoquandSiles12} who follow through this idea had to `stand on their  heads' and
       
  1972   first formalise the quite intricate notion of \emph{inductively finite sets} in 
       
  1973   order to formalise this property.
  1963 
  1974 
  1964 
  1975 
  1965   Fortunately, there is a much simpler approach using \emph{partial
  1976   Fortunately, there is a much simpler approach using \emph{partial
  1966   derivatives}. They were introduced by \citeN{Antimirov95} and can be defined
  1977   derivatives}. They were introduced by \citeN{Antimirov95} and can be defined
  1967   in Isabelle/HOL as follows:
  1978   in Isabelle/HOL as follows:
  2666   our formalisation \cite{myhillnerodeafp11} can be found in the Archive of Formal Proofs at 
  2677   our formalisation \cite{myhillnerodeafp11} can be found in the Archive of Formal Proofs at 
  2667   \mbox{\url{http://afp.sourceforge.net/entries/Myhill-Nerode.shtml}}.
  2678   \mbox{\url{http://afp.sourceforge.net/entries/Myhill-Nerode.shtml}}.
  2668   
  2679   
  2669   Our future work will focus on formalising the regular expression matchers
  2680   Our future work will focus on formalising the regular expression matchers
  2670   developed by \citeN{Sulzmann12} which generate variable assignments for
  2681   developed by \citeN{Sulzmann12} which generate variable assignments for
  2671   regular expression submatching.\smallskip
  2682   regular expression sub-matching. For this they use both, derivatives and
       
  2683   partial derivatives of regular expressions.\smallskip
  2672 
  2684 
  2673   \noindent
  2685   \noindent
  2674   {\bf Acknowledgements:}
  2686   {\bf Acknowledgements:}
  2675   We are grateful for the comments we received from Larry Paulson.  Tobias
  2687   We are grateful for the comments we received from Larry Paulson.  Tobias
  2676   Nipkow made us aware of the properties in Theorem~\ref{subseqreg} and Tjark
  2688   Nipkow made us aware of the properties in Theorem~\ref{subseqreg} and Tjark