# HG changeset patch # User Christian Urban # Date 1458112512 0 # Node ID e3eb82ea2244c517a7fdb11a48e9f11cdda87307 # Parent 5a1196466a9cfa21b52593cc0d41a2184fa284f2 updated diff -r 5a1196466a9c -r e3eb82ea2244 thys/Paper/Paper.thy --- a/thys/Paper/Paper.thy Tue Mar 15 01:10:38 2016 +0000 +++ b/thys/Paper/Paper.thy Wed Mar 16 07:15:12 2016 +0000 @@ -83,8 +83,8 @@ and easily definable and reasoned about in theorem provers---the definitions just consist of inductive datatypes and simple recursive functions. A completely formalised correctness proof of this matcher in for example HOL4 -has been mentioned in~\cite{Owens2008}. Another one in Isabelle/HOL is part -of the work in \cite{Krauss2011}. +has been mentioned by Owens and Slind~\cite{Owens2008}. Another one in Isabelle/HOL is part +of the work by Krauss and Nipkow \cite{Krauss2011}. One limitation of Brzozowski's matcher is that it only generates a YES/NO answer for whether a string is being matched by a regular expression. @@ -106,11 +106,11 @@ observations that, without evidence that it is transitive, it cannot be called an ``order relation'', and that the relation is called a ``total order'' despite being evidently not total\footnote{The relation @{text -"\\<^bsub>r\<^esub>"} defined in \cite{Sulzmann2014} is a relation on the +"\\<^bsub>r\<^esub>"} defined by Sulzmann and Lu \cite{Sulzmann2014} is a relation on the values for the regular expression @{term r}; but it only holds between -@{term v} and @{term "v'"} in cases where @{term v} and @{term "v'"} have +@{term "v\<^sub>1"} and @{term "v\<^sub>2"} in cases where @{term "v\<^sub>1"} and @{term "v\<^sub>2"} have the same flattening (underlying string). So a counterexample to totality is -given by taking two values @{term v} and @{term "v'"} for @{term r} that +given by taking two values @{term "v\<^sub>1"} and @{term "v\<^sub>2"} for @{term r} that have different flattenings (see Section~\ref{posixsec}). A different relation @{text "\\<^bsub>r,s\<^esub>"} on the set of values for @{term r} with flattening @{term s} is definable by the same approach, and is indeed @@ -123,7 +123,7 @@ value is unique. Proofs are both done by hand and checked in Isabelle/HOL. The experience of doing our proofs has been that this mechanical checking was absolutely essential: this subject area has hidden snares. This was also -noted by Kuklewitz \cite{Kuklewicz} who found that nearly all POSIX matching +noted by Kuklewicz \cite{Kuklewicz} who found that nearly all POSIX matching implementations are ``buggy'' \cite[Page 203]{Sulzmann2014}. If a regular expression matches a string, then in general there is more than @@ -150,12 +150,12 @@ expressions: \begin{itemize} -\item[$\bullet$] \underline{The Longest Match Rule (or ``maximal munch rule''):} +\item[$\bullet$] \underline{The Longest Match Rule (or ``maximal munch rule''):}\smallskip The longest initial substring matched by any regular expression is taken as next token.\smallskip -\item[$\bullet$] \underline{Priority Rule:} +\item[$\bullet$] \underline{Priority Rule:}\smallskip For a particular longest initial substring, the first regular expression that can match determines the token. @@ -333,7 +333,7 @@ text {* - The clever idea in \cite{Sulzmann2014} is to introduce values for encoding + The clever idea by Sulzmann and Lu \cite{Sulzmann2014} is to introduce values for encoding \emph{how} a regular expression matches a string and then define a function on values that mirrors (but inverts) the construction of the derivative on regular expressions. \emph{Values} are defined as the @@ -349,11 +349,12 @@ @{term "Stars vs"} \end{center} - \noindent where we use @{term vs} to stand for a list of values. (This is - similar to the approach taken by Frisch and Cardelli for GREEDY matching - \cite{Frisch2004}, and Sulzmann and Lu \cite{Sulzmann2014} for POSIX - matching). The string underlying a value can be calculated by the @{const - flat} function, written @{term "flat DUMMY"} and defined as: + \noindent where we use @{term vs} to stand for a list of + values. (This is similar to the approach taken by Frisch and + Cardelli for GREEDY matching \cite{Frisch2004}, and Sulzmann and Lu + for POSIX matching \cite{Sulzmann2014}). The string underlying a + value can be calculated by the @{const flat} function, written + @{term "flat DUMMY"} and defined as: \begin{center} \begin{tabular}{lcl} @@ -586,7 +587,7 @@ In contrast, we shall introduce a simple inductive definition that specifies directly what a \emph{POSIX value} is, incorporating the POSIX-specific choices into the side-conditions of our rules. Our - definition is inspired by the matching relation given in + definition is inspired by the matching relation given by Vansummeren \cite{Vansummeren2006}. The relation we define is ternary and written as \mbox{@{term "s \ r \ v"}}, relating strings, regular expressions and values. @@ -1096,12 +1097,13 @@ text {* - We have implemented the POSIX value calculation algorithm introduced in + We have implemented the POSIX value calculation algorithm introduced by + Sulzmann and Lu \cite{Sulzmann2014}. Our implementation is nearly identical to the original and all modifications we introduced are harmless (like our char-clause for @{text inj}). We have proved this algorithm to be correct, but correct according to our own specification of what POSIX values are. Our - specification (inspired from work in \cite{Vansummeren2006}) appears to be + specification (inspired from work by Vansummeren \cite{Vansummeren2006}) appears to be much simpler than in \cite{Sulzmann2014} and our proofs are nearly always straightforward. We have attempted to formalise the original proof by Sulzmann and Lu \cite{Sulzmann2014}, but we believe it contains @@ -1129,15 +1131,15 @@ computational. The algorithm by Sulzmann and Lu, in contrast, can be implemented with easy in any functional language. A bespoke lexer for the Imp-language is formalised in Coq as part of the Software Foundations book - \cite{Pierce2015}. The disadvantage of such bespoke lexers is that they + by Pierce et al\cite{Pierce2015}. The disadvantage of such bespoke lexers is that they do not generalise easily to more advanced features. Our formalisation is available from - \url{http://www.inf.kcl.ac.uk/staff/urbanc/lex}. + \url{http://www.inf.kcl.ac.uk/staff/urbanc/lex}.\medskip - %\noindent - %{\bf Acknowledgements:} - %We are grateful for the comments we received from anonymous - %referees. + \noindent + {\bf Acknowledgements:} + We are very grateful to Martin Sulzmann for his comments on our work and + also patiently explaining to us the details in \cite{Sulzmann2014}. \small \bibliographystyle{plain} \bibliography{root} diff -r 5a1196466a9c -r e3eb82ea2244 thys/Paper/document/root.tex --- a/thys/Paper/document/root.tex Tue Mar 15 01:10:38 2016 +0000 +++ b/thys/Paper/document/root.tex Wed Mar 16 07:15:12 2016 +0000 @@ -26,7 +26,7 @@ \renewcommand{\isasymequiv}{$\dn$} \renewcommand{\isasymemptyset}{$\varnothing$} \renewcommand{\isacharunderscore}{\mbox{$\_\!\_$}} -\renewcommand{\isacharprime}{\mbox{$\mbox{}\!\!\mbox{$^\prime$}$}} +%%\renewcommand{\isacharprime}{\makebox[0mm]{$\mbox{}\mbox{$\,^\prime$}$}} \definecolor{mygrey}{rgb}{.80,.80,.80} \def\Brz{Brzozowski} diff -r 5a1196466a9c -r e3eb82ea2244 thys/paper.pdf Binary file thys/paper.pdf has changed