lexing: comparison thys/Paper/Paper.thy

equal deleted inserted replaced

-:8b41d01b5e5d
+:15ef2af1a6f2
 and only if \mbox{@{term "s \<in> L(der c r)"}}. The beauty of Brzozowski's
 derivatives is that they are neatly expressible in any functional language,
 and easily definable and reasoned about in theorem provers---the definitions
 just consist of inductive datatypes and simple recursive functions. A
 completely formalised correctness proof of this matcher in for example HOL4
-has been given in~\cite{Owens2008}.
+has been mentioned in~\cite{Owens2008}.
 One limitation of Brzozowski's matcher is that it only generates a YES/NO
 answer for whether a string is being matched by a regular expression.
 Sulzmann and Lu \cite{Sulzmann2014} extended this matcher to allow
 generation not just of a YES/NO answer but of an actual matching, called a
 element and that it is computed by their derivative-based algorithm. This
 proof idea is inspired by work of Frisch and Cardelli \cite{Frisch2004} on a
 GREEDY regular expression matching algorithm. Beginning with our
 observations that, without evidence that it is transitive, it cannot be
 called an ``order relation'', and that the relation is called a ``total
-order'' despite being evidently not total\footnote{\textcolor{green}{We
+order'' despite being evidently not total\footnote{The relation @{text
-should give an argument as footnote}}, we identify problems with this
+"\<ge>\<^bsub>r\<^esub>"} defined in \cite{Sulzmann2014} is a relation on the values for
-approach (of which some of the proofs are not published in
+the regular expression @{term r}; but it only holds between @{term v} and
-\cite{Sulzmann2014}); perhaps more importantly, we give a simple inductive
+@{term "v'"} in cases where @{term v} and @{term "v'"} have the same
-(and algorithm-independent) definition of what we call being a {\em POSIX
+flattening (underlying string). So a counterexample to totality is given by
-value} for a regular expression @{term r} and a string @{term s}; we show
+taking two values @{term v} and @{term "v'"} for @{term r} that have
-that the algorithm computes such a value and that such a value is unique.
+different flattenings. A different relation @{text "\<ge>\<^bsub>r,s\<^esub>"}
-Proofs are both done by hand and checked in Isabelle/HOL. The
+on the set of values for @{term r} with flattening @{term s} is definable by
-experience of doing our proofs has been that this mechanical checking was
+the same approach, and is indeed total; but that is not what Proposition 1
-absolutely essential: this subject area has hidden snares. This was also
+of \cite{Sulzmann2014} does.}, we identify problems with this approach (of
-noted by Kuklewitz \cite{Kuklewicz} who found that nearly all POSIX matching
+which some of the proofs are not published in \cite{Sulzmann2014}); perhaps
-implementations are ``buggy'' \cite[Page 203]{Sulzmann2014}.
+more importantly, we give a simple inductive (and algorithm-independent)
+definition of what we call being a {\em POSIX value} for a regular
+expression @{term r} and a string @{term s}; we show that the algorithm
+computes such a value and that such a value is unique. Proofs are both done
+by hand and checked in Isabelle/HOL. The experience of doing our proofs has
+been that this mechanical checking was absolutely essential: this subject
+area has hidden snares. This was also noted by Kuklewitz \cite{Kuklewicz}
+who found that nearly all POSIX matching implementations are ``buggy''
+\cite[Page 203]{Sulzmann2014}.
 If a regular expression matches a string, then in general there is more than
 one way of how the string is matched. There are two commonly used
 disambiguation strategies to generate a unique answer: one is called GREEDY
 matching \cite{Frisch2004} and the other is POSIX
 expression can match the empty string. If yes, we call the function
 $mkeps$.
 \begin{figure}[t]
 \begin{center}
-\begin{tikzpicture}[scale=2,node distance=1.2cm,
+\begin{tikzpicture}[scale=2,node distance=1.3cm,
 every node/.style={minimum size=7mm}]
 \node (r1)  {@{term "r\<^sub>1"}};
 \node (r2) [right=of r1]{@{term "r\<^sub>2"}};
 \draw[->,line width=1mm](r1)--(r2) node[above,midway] {@{term "der a DUMMY"}};
 \node (r3) [right=of r2]{@{term "r\<^sub>3"}};
 \draw[->,line width=1mm](r3)--(r4) node[above,midway] {@{term "der c DUMMY"}};
 \draw (r4) node[anchor=west] {\;\raisebox{3mm}{@{term nullable}}};
 \node (v4) [below=of r4]{@{term "v\<^sub>4"}};
 \draw[->,line width=1mm](r4) -- (v4);
 \node (v3) [left=of v4] {@{term "v\<^sub>3"}};
-\draw[->,line width=1mm](v4)--(v3) node[below,midway] {@{term "inj c"}};
+\draw[->,line width=1mm](v4)--(v3) node[below,midway] {@{text "inj r\<^sub>3 c"}};
 \node (v2) [left=of v3]{@{term "v\<^sub>2"}};
-\draw[->,line width=1mm](v3)--(v2) node[below,midway] {@{term "inj b"}};
+\draw[->,line width=1mm](v3)--(v2) node[below,midway] {@{text "inj r\<^sub>2 b"}};
 \node (v1) [left=of v2] {@{term "v\<^sub>1"}};
-\draw[->,line width=1mm](v2)--(v1) node[below,midway] {@{term "inj a"}};
+\draw[->,line width=1mm](v2)--(v1) node[below,midway] {@{text "inj r\<^sub>1 a"}};
 \draw (r4) node[anchor=north west] {\;\raisebox{-8mm}{@{term "mkeps"}}};
 \end{tikzpicture}
 \end{center}
 \caption{The two phases of the algorithm by Sulzmann \& Lu \cite{Sulzmann2014}
-analysing the string @{term "[a,b,c]"}. The first phase (the arrows from
+matching the string @{term "[a,b,c]"}. The first phase (the arrows from
 left to right) is \Brz's matcher building succesive derivatives. If at the
 last regular expression is @{term nullable}, then functions of the
 second phase are called: first @{term mkeps} calculates a value witnessing
 how the empty string has been recognised by @{term "r\<^sub>4"}. After
 that the function @{term inj} `injects back' the characters of the string into

changeset 115	15ef2af1a6f2
parent 114	8b41d01b5e5d
child 116	022503caa187