updated
authorChristian Urban <christian dot urban at kcl dot ac dot uk>
Sat, 05 Mar 2016 12:10:01 +0000
changeset 115 15ef2af1a6f2
parent 114 8b41d01b5e5d
child 116 022503caa187
updated
thys/Paper/Paper.thy
thys/paper.pdf
--- a/thys/Paper/Paper.thy	Sat Mar 05 05:41:51 2016 +0000
+++ b/thys/Paper/Paper.thy	Sat Mar 05 12:10:01 2016 +0000
@@ -64,7 +64,7 @@
 and easily definable and reasoned about in theorem provers---the definitions
 just consist of inductive datatypes and simple recursive functions. A
 completely formalised correctness proof of this matcher in for example HOL4
-has been given in~\cite{Owens2008}.
+has been mentioned in~\cite{Owens2008}.
 
 One limitation of Brzozowski's matcher is that it only generates a YES/NO
 answer for whether a string is being matched by a regular expression.
@@ -85,18 +85,26 @@
 GREEDY regular expression matching algorithm. Beginning with our
 observations that, without evidence that it is transitive, it cannot be
 called an ``order relation'', and that the relation is called a ``total
-order'' despite being evidently not total\footnote{\textcolor{green}{We
-should give an argument as footnote}}, we identify problems with this
-approach (of which some of the proofs are not published in
-\cite{Sulzmann2014}); perhaps more importantly, we give a simple inductive
-(and algorithm-independent) definition of what we call being a {\em POSIX
-value} for a regular expression @{term r} and a string @{term s}; we show
-that the algorithm computes such a value and that such a value is unique.
-Proofs are both done by hand and checked in Isabelle/HOL. The
-experience of doing our proofs has been that this mechanical checking was
-absolutely essential: this subject area has hidden snares. This was also
-noted by Kuklewitz \cite{Kuklewicz} who found that nearly all POSIX matching
-implementations are ``buggy'' \cite[Page 203]{Sulzmann2014}.
+order'' despite being evidently not total\footnote{The relation @{text
+"\<ge>\<^bsub>r\<^esub>"} defined in \cite{Sulzmann2014} is a relation on the values for
+the regular expression @{term r}; but it only holds between @{term v} and
+@{term "v'"} in cases where @{term v} and @{term "v'"} have the same
+flattening (underlying string). So a counterexample to totality is given by
+taking two values @{term v} and @{term "v'"} for @{term r} that have
+different flattenings. A different relation @{text "\<ge>\<^bsub>r,s\<^esub>"}
+on the set of values for @{term r} with flattening @{term s} is definable by
+the same approach, and is indeed total; but that is not what Proposition 1
+of \cite{Sulzmann2014} does.}, we identify problems with this approach (of
+which some of the proofs are not published in \cite{Sulzmann2014}); perhaps
+more importantly, we give a simple inductive (and algorithm-independent)
+definition of what we call being a {\em POSIX value} for a regular
+expression @{term r} and a string @{term s}; we show that the algorithm
+computes such a value and that such a value is unique. Proofs are both done
+by hand and checked in Isabelle/HOL. The experience of doing our proofs has
+been that this mechanical checking was absolutely essential: this subject
+area has hidden snares. This was also noted by Kuklewitz \cite{Kuklewicz}
+who found that nearly all POSIX matching implementations are ``buggy''
+\cite[Page 203]{Sulzmann2014}.
 
 If a regular expression matches a string, then in general there is more than
 one way of how the string is matched. There are two commonly used
@@ -369,7 +377,7 @@
 
 \begin{figure}[t]
 \begin{center}
-\begin{tikzpicture}[scale=2,node distance=1.2cm,
+\begin{tikzpicture}[scale=2,node distance=1.3cm,
                     every node/.style={minimum size=7mm}]
 \node (r1)  {@{term "r\<^sub>1"}};
 \node (r2) [right=of r1]{@{term "r\<^sub>2"}};
@@ -382,16 +390,16 @@
 \node (v4) [below=of r4]{@{term "v\<^sub>4"}};
 \draw[->,line width=1mm](r4) -- (v4);
 \node (v3) [left=of v4] {@{term "v\<^sub>3"}};
-\draw[->,line width=1mm](v4)--(v3) node[below,midway] {@{term "inj c"}};
+\draw[->,line width=1mm](v4)--(v3) node[below,midway] {@{text "inj r\<^sub>3 c"}};
 \node (v2) [left=of v3]{@{term "v\<^sub>2"}};
-\draw[->,line width=1mm](v3)--(v2) node[below,midway] {@{term "inj b"}};
+\draw[->,line width=1mm](v3)--(v2) node[below,midway] {@{text "inj r\<^sub>2 b"}};
 \node (v1) [left=of v2] {@{term "v\<^sub>1"}};
-\draw[->,line width=1mm](v2)--(v1) node[below,midway] {@{term "inj a"}};
+\draw[->,line width=1mm](v2)--(v1) node[below,midway] {@{text "inj r\<^sub>1 a"}};
 \draw (r4) node[anchor=north west] {\;\raisebox{-8mm}{@{term "mkeps"}}};
 \end{tikzpicture}
 \end{center}
 \caption{The two phases of the algorithm by Sulzmann \& Lu \cite{Sulzmann2014}
-analysing the string @{term "[a,b,c]"}. The first phase (the arrows from 
+matching the string @{term "[a,b,c]"}. The first phase (the arrows from 
 left to right) is \Brz's matcher building succesive derivatives. If at the 
 last regular expression is @{term nullable}, then functions of the 
 second phase are called: first @{term mkeps} calculates a value witnessing
Binary file thys/paper.pdf has changed