thys/Paper/Paper.thy
changeset 169 072a701bb153
parent 165 ca4dcfd912cb
child 171 91647a8d84a3
equal deleted inserted replaced
168:6b0a1976f89a 169:072a701bb153
    82 derivatives is that they are neatly expressible in any functional language,
    82 derivatives is that they are neatly expressible in any functional language,
    83 and easily definable and reasoned about in theorem provers---the definitions
    83 and easily definable and reasoned about in theorem provers---the definitions
    84 just consist of inductive datatypes and simple recursive functions. A
    84 just consist of inductive datatypes and simple recursive functions. A
    85 completely formalised correctness proof of this matcher in for example HOL4
    85 completely formalised correctness proof of this matcher in for example HOL4
    86 has been mentioned by Owens and Slind~\cite{Owens2008}. Another one in Isabelle/HOL is part
    86 has been mentioned by Owens and Slind~\cite{Owens2008}. Another one in Isabelle/HOL is part
    87 of the work by Krauss and Nipkow \cite{Krauss2011}.
    87 of the work by Krauss and Nipkow \cite{Krauss2011}. And another one in Coq is given
       
    88 by Coquand and Siles \cite{Coquand2012}.
    88 
    89 
    89 One limitation of Brzozowski's matcher is that it only generates a YES/NO
    90 One limitation of Brzozowski's matcher is that it only generates a YES/NO
    90 answer for whether a string is being matched by a regular expression.
    91 answer for whether a string is being matched by a regular expression.
    91 Sulzmann and Lu \cite{Sulzmann2014} extended this matcher to allow
    92 Sulzmann and Lu \cite{Sulzmann2014} extended this matcher to allow
    92 generation not just of a YES/NO answer but of an actual matching, called a
    93 generation not just of a YES/NO answer but of an actual matching, called a
   100 The answer given by Sulzmann and Lu \cite{Sulzmann2014} is to define a
   101 The answer given by Sulzmann and Lu \cite{Sulzmann2014} is to define a
   101 relation (called an ``order relation'') on the set of values of @{term r},
   102 relation (called an ``order relation'') on the set of values of @{term r},
   102 and to show that (once a string to be matched is chosen) there is a maximum
   103 and to show that (once a string to be matched is chosen) there is a maximum
   103 element and that it is computed by their derivative-based algorithm. This
   104 element and that it is computed by their derivative-based algorithm. This
   104 proof idea is inspired by work of Frisch and Cardelli \cite{Frisch2004} on a
   105 proof idea is inspired by work of Frisch and Cardelli \cite{Frisch2004} on a
   105 GREEDY regular expression matching algorithm. Beginning with our
   106 GREEDY regular expression matching algorithm. However, we were not able to 
   106 observations that, without evidence that it is transitive, it cannot be
   107 establish transitivity and totality for the ``order relation'' by 
   107 called an ``order relation'', and that the relation is called a ``total
   108 Sulzmann and Lu. In Section \ref{arg} we
   108 order'' despite being evidently not total\footnote{The relation @{text
   109 identify problems with their approach (of which some of the proofs are not
   109 "\<ge>\<^bsub>r\<^esub>"} defined by Sulzmann and Lu \cite{Sulzmann2014} is a relation on the
       
   110 values for the regular expression @{term r}; but it only holds between
       
   111 @{term "v\<^sub>1"} and @{term "v\<^sub>2"} in cases where @{term "v\<^sub>1"} and @{term "v\<^sub>2"} have
       
   112 the same flattening (underlying string). So a counterexample to totality is
       
   113 given by taking two values @{term "v\<^sub>1"} and @{term "v\<^sub>2"} for @{term r} that
       
   114 have different flattenings (see Section~\ref{posixsec}). A different
       
   115 relation @{text "\<ge>\<^bsub>r,s\<^esub>"} on the set of values for @{term r}
       
   116 with flattening @{term s} is definable by the same approach, and is indeed
       
   117 total; but that is not what Proposition 1 of \cite{Sulzmann2014} does.}, we
       
   118 identify problems with this approach (of which some of the proofs are not
       
   119 published in \cite{Sulzmann2014}); perhaps more importantly, we give a
   110 published in \cite{Sulzmann2014}); perhaps more importantly, we give a
   120 simple inductive (and algorithm-independent) definition of what we call
   111 simple inductive (and algorithm-independent) definition of what we call
   121 being a {\em POSIX value} for a regular expression @{term r} and a string
   112 being a {\em POSIX value} for a regular expression @{term r} and a string
   122 @{term s}; we show that the algorithm computes such a value and that such a
   113 @{term s}; we show that the algorithm computes such a value and that such a
   123 value is unique. Proofs are both done by hand and checked in Isabelle/HOL.
   114 value is unique. Proofs are both done by hand and checked in Isabelle/HOL.
   124 The experience of doing our proofs has been that this mechanical checking
   115 The experience of doing our proofs has been that this mechanical checking
   125 was absolutely essential: this subject area has hidden snares. This was also
   116 was absolutely essential: this subject area has hidden snares. This was also
   126 noted by Kuklewicz \cite{Kuklewicz} who found that nearly all POSIX matching
   117 noted by Kuklewicz \cite{Kuklewicz} who found that nearly all POSIX matching
   127 implementations are ``buggy'' \cite[Page 203]{Sulzmann2014}.
   118 implementations are ``buggy'' \cite[Page 203]{Sulzmann2014}.
       
   119 
       
   120 %\footnote{The relation @{text "\<ge>\<^bsub>r\<^esub>"} defined by Sulzmann and Lu \cite{Sulzmann2014} 
       
   121 %is a relation on the
       
   122 %values for the regular expression @{term r}; but it only holds between
       
   123 %@{term "v\<^sub>1"} and @{term "v\<^sub>2"} in cases where @{term "v\<^sub>1"} and @{term "v\<^sub>2"} have
       
   124 %the same flattening (underlying string). So a counterexample to totality is
       
   125 %given by taking two values @{term "v\<^sub>1"} and @{term "v\<^sub>2"} for @{term r} that
       
   126 %have different flattenings (see Section~\ref{posixsec}). A different
       
   127 %relation @{text "\<ge>\<^bsub>r,s\<^esub>"} on the set of values for @{term r}
       
   128 %with flattening @{term s} is definable by the same approach, and is indeed
       
   129 %total; but that is not what Proposition 1 of \cite{Sulzmann2014} does.}
       
   130 
   128 
   131 
   129 If a regular expression matches a string, then in general there is more than
   132 If a regular expression matches a string, then in general there is more than
   130 one way of how the string is matched. There are two commonly used
   133 one way of how the string is matched. There are two commonly used
   131 disambiguation strategies to generate a unique answer: one is called GREEDY
   134 disambiguation strategies to generate a unique answer: one is called GREEDY
   132 matching \cite{Frisch2004} and the other is POSIX
   135 matching \cite{Frisch2004} and the other is POSIX
   900   \noindent
   903   \noindent
   901   holds but for lack of space refer the reader to our mechanisation for details.
   904   holds but for lack of space refer the reader to our mechanisation for details.
   902 
   905 
   903 *}
   906 *}
   904 
   907 
   905 section {* The Correctness Argument by Sulzmmann and Lu *}
   908 section {* The Correctness Argument by Sulzmmann and Lu\label{argu} *}
   906 
   909 
   907 text {*
   910 text {*
   908 %  \newcommand{\greedy}{\succcurlyeq_{gr}}
   911 %  \newcommand{\greedy}{\succcurlyeq_{gr}}
   909  \newcommand{\posix}{>}
   912  \newcommand{\posix}{>}
   910 
   913