# HG changeset patch # User Christian Urban # Date 1457179801 0 # Node ID 15ef2af1a6f2c5405eb23385f40639d1052688d0 # Parent 8b41d01b5e5d8e3f945f333890b0a00ce429652f updated diff -r 8b41d01b5e5d -r 15ef2af1a6f2 thys/Paper/Paper.thy --- a/thys/Paper/Paper.thy Sat Mar 05 05:41:51 2016 +0000 +++ b/thys/Paper/Paper.thy Sat Mar 05 12:10:01 2016 +0000 @@ -64,7 +64,7 @@ and easily definable and reasoned about in theorem provers---the definitions just consist of inductive datatypes and simple recursive functions. A completely formalised correctness proof of this matcher in for example HOL4 -has been given in~\cite{Owens2008}. +has been mentioned in~\cite{Owens2008}. One limitation of Brzozowski's matcher is that it only generates a YES/NO answer for whether a string is being matched by a regular expression. @@ -85,18 +85,26 @@ GREEDY regular expression matching algorithm. Beginning with our observations that, without evidence that it is transitive, it cannot be called an ``order relation'', and that the relation is called a ``total -order'' despite being evidently not total\footnote{\textcolor{green}{We -should give an argument as footnote}}, we identify problems with this -approach (of which some of the proofs are not published in -\cite{Sulzmann2014}); perhaps more importantly, we give a simple inductive -(and algorithm-independent) definition of what we call being a {\em POSIX -value} for a regular expression @{term r} and a string @{term s}; we show -that the algorithm computes such a value and that such a value is unique. -Proofs are both done by hand and checked in Isabelle/HOL. The -experience of doing our proofs has been that this mechanical checking was -absolutely essential: this subject area has hidden snares. This was also -noted by Kuklewitz \cite{Kuklewicz} who found that nearly all POSIX matching -implementations are ``buggy'' \cite[Page 203]{Sulzmann2014}. +order'' despite being evidently not total\footnote{The relation @{text +"\\<^bsub>r\<^esub>"} defined in \cite{Sulzmann2014} is a relation on the values for +the regular expression @{term r}; but it only holds between @{term v} and +@{term "v'"} in cases where @{term v} and @{term "v'"} have the same +flattening (underlying string). So a counterexample to totality is given by +taking two values @{term v} and @{term "v'"} for @{term r} that have +different flattenings. A different relation @{text "\\<^bsub>r,s\<^esub>"} +on the set of values for @{term r} with flattening @{term s} is definable by +the same approach, and is indeed total; but that is not what Proposition 1 +of \cite{Sulzmann2014} does.}, we identify problems with this approach (of +which some of the proofs are not published in \cite{Sulzmann2014}); perhaps +more importantly, we give a simple inductive (and algorithm-independent) +definition of what we call being a {\em POSIX value} for a regular +expression @{term r} and a string @{term s}; we show that the algorithm +computes such a value and that such a value is unique. Proofs are both done +by hand and checked in Isabelle/HOL. The experience of doing our proofs has +been that this mechanical checking was absolutely essential: this subject +area has hidden snares. This was also noted by Kuklewitz \cite{Kuklewicz} +who found that nearly all POSIX matching implementations are ``buggy'' +\cite[Page 203]{Sulzmann2014}. If a regular expression matches a string, then in general there is more than one way of how the string is matched. There are two commonly used @@ -369,7 +377,7 @@ \begin{figure}[t] \begin{center} -\begin{tikzpicture}[scale=2,node distance=1.2cm, +\begin{tikzpicture}[scale=2,node distance=1.3cm, every node/.style={minimum size=7mm}] \node (r1) {@{term "r\<^sub>1"}}; \node (r2) [right=of r1]{@{term "r\<^sub>2"}}; @@ -382,16 +390,16 @@ \node (v4) [below=of r4]{@{term "v\<^sub>4"}}; \draw[->,line width=1mm](r4) -- (v4); \node (v3) [left=of v4] {@{term "v\<^sub>3"}}; -\draw[->,line width=1mm](v4)--(v3) node[below,midway] {@{term "inj c"}}; +\draw[->,line width=1mm](v4)--(v3) node[below,midway] {@{text "inj r\<^sub>3 c"}}; \node (v2) [left=of v3]{@{term "v\<^sub>2"}}; -\draw[->,line width=1mm](v3)--(v2) node[below,midway] {@{term "inj b"}}; +\draw[->,line width=1mm](v3)--(v2) node[below,midway] {@{text "inj r\<^sub>2 b"}}; \node (v1) [left=of v2] {@{term "v\<^sub>1"}}; -\draw[->,line width=1mm](v2)--(v1) node[below,midway] {@{term "inj a"}}; +\draw[->,line width=1mm](v2)--(v1) node[below,midway] {@{text "inj r\<^sub>1 a"}}; \draw (r4) node[anchor=north west] {\;\raisebox{-8mm}{@{term "mkeps"}}}; \end{tikzpicture} \end{center} \caption{The two phases of the algorithm by Sulzmann \& Lu \cite{Sulzmann2014} -analysing the string @{term "[a,b,c]"}. The first phase (the arrows from +matching the string @{term "[a,b,c]"}. The first phase (the arrows from left to right) is \Brz's matcher building succesive derivatives. If at the last regular expression is @{term nullable}, then functions of the second phase are called: first @{term mkeps} calculates a value witnessing diff -r 8b41d01b5e5d -r 15ef2af1a6f2 thys/paper.pdf Binary file thys/paper.pdf has changed