lexing: comparison thys/Paper/Paper.thy

equal deleted inserted replaced

-:6b0a1976f89a
+:072a701bb153
 derivatives is that they are neatly expressible in any functional language,
 and easily definable and reasoned about in theorem provers---the definitions
 just consist of inductive datatypes and simple recursive functions. A
 completely formalised correctness proof of this matcher in for example HOL4
 has been mentioned by Owens and Slind~\cite{Owens2008}. Another one in Isabelle/HOL is part
-of the work by Krauss and Nipkow \cite{Krauss2011}.
+of the work by Krauss and Nipkow \cite{Krauss2011}. And another one in Coq is given
+by Coquand and Siles \cite{Coquand2012}.
 One limitation of Brzozowski's matcher is that it only generates a YES/NO
 answer for whether a string is being matched by a regular expression.
 Sulzmann and Lu \cite{Sulzmann2014} extended this matcher to allow
 generation not just of a YES/NO answer but of an actual matching, called a
 The answer given by Sulzmann and Lu \cite{Sulzmann2014} is to define a
 relation (called an ``order relation'') on the set of values of @{term r},
 and to show that (once a string to be matched is chosen) there is a maximum
 element and that it is computed by their derivative-based algorithm. This
 proof idea is inspired by work of Frisch and Cardelli \cite{Frisch2004} on a
-GREEDY regular expression matching algorithm. Beginning with our
+GREEDY regular expression matching algorithm. However, we were not able to
-observations that, without evidence that it is transitive, it cannot be
+establish transitivity and totality for the ``order relation'' by
-called an ``order relation'', and that the relation is called a ``total
+Sulzmann and Lu. In Section \ref{arg} we
-order'' despite being evidently not total\footnote{The relation @{text
+identify problems with their approach (of which some of the proofs are not
-"\<ge>\<^bsub>r\<^esub>"} defined by Sulzmann and Lu \cite{Sulzmann2014} is a relation on the
-values for the regular expression @{term r}; but it only holds between
-@{term "v\<^sub>1"} and @{term "v\<^sub>2"} in cases where @{term "v\<^sub>1"} and @{term "v\<^sub>2"} have
-the same flattening (underlying string). So a counterexample to totality is
-given by taking two values @{term "v\<^sub>1"} and @{term "v\<^sub>2"} for @{term r} that
-have different flattenings (see Section~\ref{posixsec}). A different
-relation @{text "\<ge>\<^bsub>r,s\<^esub>"} on the set of values for @{term r}
-with flattening @{term s} is definable by the same approach, and is indeed
-total; but that is not what Proposition 1 of \cite{Sulzmann2014} does.}, we
-identify problems with this approach (of which some of the proofs are not
 published in \cite{Sulzmann2014}); perhaps more importantly, we give a
 simple inductive (and algorithm-independent) definition of what we call
 being a {\em POSIX value} for a regular expression @{term r} and a string
 @{term s}; we show that the algorithm computes such a value and that such a
 value is unique. Proofs are both done by hand and checked in Isabelle/HOL.
 The experience of doing our proofs has been that this mechanical checking
 was absolutely essential: this subject area has hidden snares. This was also
 noted by Kuklewicz \cite{Kuklewicz} who found that nearly all POSIX matching
 implementations are ``buggy'' \cite[Page 203]{Sulzmann2014}.
+%\footnote{The relation @{text "\<ge>\<^bsub>r\<^esub>"} defined by Sulzmann and Lu \cite{Sulzmann2014}
+%is a relation on the
+%values for the regular expression @{term r}; but it only holds between
+%@{term "v\<^sub>1"} and @{term "v\<^sub>2"} in cases where @{term "v\<^sub>1"} and @{term "v\<^sub>2"} have
+%the same flattening (underlying string). So a counterexample to totality is
+%given by taking two values @{term "v\<^sub>1"} and @{term "v\<^sub>2"} for @{term r} that
+%have different flattenings (see Section~\ref{posixsec}). A different
+%relation @{text "\<ge>\<^bsub>r,s\<^esub>"} on the set of values for @{term r}
+%with flattening @{term s} is definable by the same approach, and is indeed
+%total; but that is not what Proposition 1 of \cite{Sulzmann2014} does.}
 If a regular expression matches a string, then in general there is more than
 one way of how the string is matched. There are two commonly used
 disambiguation strategies to generate a unique answer: one is called GREEDY
 matching \cite{Frisch2004} and the other is POSIX
 \noindent
 holds but for lack of space refer the reader to our mechanisation for details.
 *}
-section {* The Correctness Argument by Sulzmmann and Lu *}
+section {* The Correctness Argument by Sulzmmann and Lu\label{argu} *}
 text {*
 %  \newcommand{\greedy}{\succcurlyeq_{gr}}
 \newcommand{\posix}{>}

changeset 169	072a701bb153
parent 165	ca4dcfd912cb
child 171	91647a8d84a3