Paper/Paper.thy
changeset 122 ab6637008963
parent 121 1cf12a107b03
child 123 23c0e6f2929d
equal deleted inserted replaced
121:1cf12a107b03 122:ab6637008963
  1056   steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text
  1056   steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text
  1057   "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"}
  1057   "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"}
  1058   regular expression.
  1058   regular expression.
  1059 
  1059 
  1060   While regular expressions are convenient in formalisations, they have some
  1060   While regular expressions are convenient in formalisations, they have some
  1061   limitations. One is that there seems to be no notion of a minimal regular
  1061   limitations. One is that there seems to be no method of calculating a
  1062   expression, like there is for automata. For an implementation of a simple
  1062   minimal regular expression (for example in terms of length), like there is
  1063   regular expression matcher, whose correctness has been formally
  1063   for automata. For an implementation of a simple regular expression matcher,
  1064   established, we refer the reader to Owens and Slind \cite{OwensSlind08}.
  1064   whose correctness has been formally established, we refer the reader to
       
  1065   Owens and Slind \cite{OwensSlind08}.
  1065 
  1066 
  1066 
  1067 
  1067   Our formalisation consists of 790 lines of Isabelle/Isar code for the first
  1068   Our formalisation consists of 790 lines of Isabelle/Isar code for the first
  1068   direction and 475 for the second, plus standard material about regular languages. 
  1069   direction and 475 for the second, plus 300 lines of standard material about
  1069   While this might be seen as too large to
  1070   regular languages. While this might be seen as too large to count as a
  1070   count as a concise proof pearl, this should be seen in the context of the
  1071   concise proof pearl, this should be seen in the context of the work done by
  1071   work done by Constable at al \cite{Constable00} who formalised the
  1072   Constable at al \cite{Constable00} who formalised the Myhill-Nerode theorem
  1072   Myhill-Nerode theorem in Nuprl using automata. 
  1073   in Nuprl using automata. They write that their four-member team needed
  1073   They write that their
  1074   something on the magnitute of 18 months for their formalisation. The
  1074   four-member team needed something on the magnitute of 18 months for their
  1075   estimate for our formalisation is that we needed approximately 3 months and
  1075   formalisation. The estimate for our formalisation is that we
  1076   this included the time to find our proof arguments. Unlike Constable et al,
  1076   needed approximately 3 months and this included the time to find our proof
  1077   who were able to follow the proofs from \cite{HopcroftUllman69}, we had to
  1077   arguments. Unlike Constable et al, who were able to follow the proofs from
  1078   find our own arguments.  So for us the formalisation was not the
  1078   \cite{HopcroftUllman69}, we had to find our own arguments.  So for us the formalisation
  1079   bottleneck. It is hard to gauge the size of a formalisation in Nurpl, but
  1079   was not the bottleneck. It is hard to gauge the size of a formalisation in
  1080   from what is shown in the Nuprl Math Library about their development it
  1080   Nurpl, but from what is shown in the Nuprl Math Library about their
  1081   seems substantially larger than ours. The code of ours can be found in the
  1081   development it seems substantially larger than ours. The code of
  1082   Mercurial Repository at
  1082   ours can be found in the Mercurial Repository at
       
  1083 
  1083 
  1084   \begin{center}
  1084   \begin{center}
  1085   \url{http://www4.in.tum.de/~urbanc/cgi-bin/repos.cgi/regexp/}
  1085   \url{http://www4.in.tum.de/~urbanc/cgi-bin/repos.cgi/regexp/}
  1086   \end{center}
  1086   \end{center}
  1087 
  1087 
  1106   expressions and shows that there can be only finitely many of them. We could
  1106   expressions and shows that there can be only finitely many of them. We could
  1107   have used as the tag of a string @{text s} the derivative of a regular expression
  1107   have used as the tag of a string @{text s} the derivative of a regular expression
  1108   generated with respect to @{text s}.  Using the fact that two strings are
  1108   generated with respect to @{text s}.  Using the fact that two strings are
  1109   Myhill-Nerode related whenever their derivative is the same together with
  1109   Myhill-Nerode related whenever their derivative is the same together with
  1110   the fact that there are only finitely many derivatives for a regular
  1110   the fact that there are only finitely many derivatives for a regular
  1111   expression would give us the same argument as ours. However it seems not so easy to
  1111   expression would give us a similar argument as ours. However it seems not so easy to
  1112   calculate the derivatives and then to count them. Therefore we preferred our
  1112   calculate the derivatives and then to count them. Therefore we preferred our
  1113   direct method of using tagging-functions involving equivalence classes. This
  1113   direct method of using tagging-functions involving equivalence classes. This
  1114   is also where our method shines, because we can completely side-step the
  1114   is also where our method shines, because we can completely side-step the
  1115   standard argument \cite{Kozen97} where automata need to be composed, which
  1115   standard argument \cite{Kozen97} where automata need to be composed, which
  1116   as stated in the Introduction is not so convenient to formalise in a 
  1116   as stated in the Introduction is not so convenient to formalise in a