--- a/thys/Paper/Paper.thy Tue Mar 15 01:10:38 2016 +0000
+++ b/thys/Paper/Paper.thy Wed Mar 16 07:15:12 2016 +0000
@@ -83,8 +83,8 @@
and easily definable and reasoned about in theorem provers---the definitions
just consist of inductive datatypes and simple recursive functions. A
completely formalised correctness proof of this matcher in for example HOL4
-has been mentioned in~\cite{Owens2008}. Another one in Isabelle/HOL is part
-of the work in \cite{Krauss2011}.
+has been mentioned by Owens and Slind~\cite{Owens2008}. Another one in Isabelle/HOL is part
+of the work by Krauss and Nipkow \cite{Krauss2011}.
One limitation of Brzozowski's matcher is that it only generates a YES/NO
answer for whether a string is being matched by a regular expression.
@@ -106,11 +106,11 @@
observations that, without evidence that it is transitive, it cannot be
called an ``order relation'', and that the relation is called a ``total
order'' despite being evidently not total\footnote{The relation @{text
-"\<ge>\<^bsub>r\<^esub>"} defined in \cite{Sulzmann2014} is a relation on the
+"\<ge>\<^bsub>r\<^esub>"} defined by Sulzmann and Lu \cite{Sulzmann2014} is a relation on the
values for the regular expression @{term r}; but it only holds between
-@{term v} and @{term "v'"} in cases where @{term v} and @{term "v'"} have
+@{term "v\<^sub>1"} and @{term "v\<^sub>2"} in cases where @{term "v\<^sub>1"} and @{term "v\<^sub>2"} have
the same flattening (underlying string). So a counterexample to totality is
-given by taking two values @{term v} and @{term "v'"} for @{term r} that
+given by taking two values @{term "v\<^sub>1"} and @{term "v\<^sub>2"} for @{term r} that
have different flattenings (see Section~\ref{posixsec}). A different
relation @{text "\<ge>\<^bsub>r,s\<^esub>"} on the set of values for @{term r}
with flattening @{term s} is definable by the same approach, and is indeed
@@ -123,7 +123,7 @@
value is unique. Proofs are both done by hand and checked in Isabelle/HOL.
The experience of doing our proofs has been that this mechanical checking
was absolutely essential: this subject area has hidden snares. This was also
-noted by Kuklewitz \cite{Kuklewicz} who found that nearly all POSIX matching
+noted by Kuklewicz \cite{Kuklewicz} who found that nearly all POSIX matching
implementations are ``buggy'' \cite[Page 203]{Sulzmann2014}.
If a regular expression matches a string, then in general there is more than
@@ -150,12 +150,12 @@
expressions:
\begin{itemize}
-\item[$\bullet$] \underline{The Longest Match Rule (or ``maximal munch rule''):}
+\item[$\bullet$] \underline{The Longest Match Rule (or ``maximal munch rule''):}\smallskip
The longest initial substring matched by any regular expression is taken as
next token.\smallskip
-\item[$\bullet$] \underline{Priority Rule:}
+\item[$\bullet$] \underline{Priority Rule:}\smallskip
For a particular longest initial substring, the first regular expression
that can match determines the token.
@@ -333,7 +333,7 @@
text {*
- The clever idea in \cite{Sulzmann2014} is to introduce values for encoding
+ The clever idea by Sulzmann and Lu \cite{Sulzmann2014} is to introduce values for encoding
\emph{how} a regular expression matches a string and then define a
function on values that mirrors (but inverts) the construction of the
derivative on regular expressions. \emph{Values} are defined as the
@@ -349,11 +349,12 @@
@{term "Stars vs"}
\end{center}
- \noindent where we use @{term vs} to stand for a list of values. (This is
- similar to the approach taken by Frisch and Cardelli for GREEDY matching
- \cite{Frisch2004}, and Sulzmann and Lu \cite{Sulzmann2014} for POSIX
- matching). The string underlying a value can be calculated by the @{const
- flat} function, written @{term "flat DUMMY"} and defined as:
+ \noindent where we use @{term vs} to stand for a list of
+ values. (This is similar to the approach taken by Frisch and
+ Cardelli for GREEDY matching \cite{Frisch2004}, and Sulzmann and Lu
+ for POSIX matching \cite{Sulzmann2014}). The string underlying a
+ value can be calculated by the @{const flat} function, written
+ @{term "flat DUMMY"} and defined as:
\begin{center}
\begin{tabular}{lcl}
@@ -586,7 +587,7 @@
In contrast, we shall introduce a simple inductive definition that
specifies directly what a \emph{POSIX value} is, incorporating the
POSIX-specific choices into the side-conditions of our rules. Our
- definition is inspired by the matching relation given in
+ definition is inspired by the matching relation given by Vansummeren
\cite{Vansummeren2006}. The relation we define is ternary and written as
\mbox{@{term "s \<in> r \<rightarrow> v"}}, relating strings, regular expressions and
values.
@@ -1096,12 +1097,13 @@
text {*
- We have implemented the POSIX value calculation algorithm introduced in
+ We have implemented the POSIX value calculation algorithm introduced by
+ Sulzmann and Lu
\cite{Sulzmann2014}. Our implementation is nearly identical to the
original and all modifications we introduced are harmless (like our char-clause for
@{text inj}). We have proved this algorithm to be correct, but correct
according to our own specification of what POSIX values are. Our
- specification (inspired from work in \cite{Vansummeren2006}) appears to be
+ specification (inspired from work by Vansummeren \cite{Vansummeren2006}) appears to be
much simpler than in \cite{Sulzmann2014} and our proofs are nearly always
straightforward. We have attempted to formalise the original proof
by Sulzmann and Lu \cite{Sulzmann2014}, but we believe it contains
@@ -1129,15 +1131,15 @@
computational. The algorithm by Sulzmann and Lu, in contrast, can be
implemented with easy in any functional language. A bespoke lexer for the
Imp-language is formalised in Coq as part of the Software Foundations book
- \cite{Pierce2015}. The disadvantage of such bespoke lexers is that they
+ by Pierce et al\cite{Pierce2015}. The disadvantage of such bespoke lexers is that they
do not generalise easily to more advanced features.
Our formalisation is available from
- \url{http://www.inf.kcl.ac.uk/staff/urbanc/lex}.
+ \url{http://www.inf.kcl.ac.uk/staff/urbanc/lex}.\medskip
- %\noindent
- %{\bf Acknowledgements:}
- %We are grateful for the comments we received from anonymous
- %referees.
+ \noindent
+ {\bf Acknowledgements:}
+ We are very grateful to Martin Sulzmann for his comments on our work and
+ also patiently explaining to us the details in \cite{Sulzmann2014}.
\small
\bibliographystyle{plain}
\bibliography{root}
--- a/thys/Paper/document/root.tex Tue Mar 15 01:10:38 2016 +0000
+++ b/thys/Paper/document/root.tex Wed Mar 16 07:15:12 2016 +0000
@@ -26,7 +26,7 @@
\renewcommand{\isasymequiv}{$\dn$}
\renewcommand{\isasymemptyset}{$\varnothing$}
\renewcommand{\isacharunderscore}{\mbox{$\_\!\_$}}
-\renewcommand{\isacharprime}{\mbox{$\mbox{}\!\!\mbox{$^\prime$}$}}
+%%\renewcommand{\isacharprime}{\makebox[0mm]{$\mbox{}\mbox{$\,^\prime$}$}}
\definecolor{mygrey}{rgb}{.80,.80,.80}
\def\Brz{Brzozowski}
Binary file thys/paper.pdf has changed