diff -r f754a10875c7 -r c840a99a3e05 thys/Journal/Paper.thy --- a/thys/Journal/Paper.thy Tue Oct 10 11:31:47 2017 +0100 +++ b/thys/Journal/Paper.thy Wed Oct 25 12:18:44 2017 +0100 @@ -3,7 +3,7 @@ imports "../Lexer" "../Simplifying" - "../Positions" + "../Positions" "~~/src/HOL/Library/LaTeXsugar" begin @@ -507,7 +507,7 @@ ordering relation for values.} \begin{center} - \begin{tabular}{c@ {\hspace{12mm}}c} + \begin{tabular}{c@ {\hspace{12mm}}c}\label{prfintros} \\[-8mm] @{thm[mode=Axiom] Prf.intros(4)} & @{thm[mode=Axiom] Prf.intros(5)[of "c"]}\\[4mm] @@ -1552,145 +1552,7 @@ *} -section {* Extensions*} -text {* - - A strong point in favour of - Sulzmann and Lu's algorithm is that it can be extended in various - ways. If we are interested in tokenising a string, then we need to not just - split up the string into tokens, but also ``classify'' the tokens (for - example whether it is a keyword or an identifier). This can be done with - only minor modifications to the algorithm by introducing \emph{record - regular expressions} and \emph{record values} (for example - \cite{Sulzmann2014b}): - - \begin{center} - @{text "r :="} - @{text "..."} $\mid$ - @{text "(l : r)"} \qquad\qquad - @{text "v :="} - @{text "..."} $\mid$ - @{text "(l : v)"} - \end{center} - - \noindent where @{text l} is a label, say a string, @{text r} a regular - expression and @{text v} a value. All functions can be smoothly extended - to these regular expressions and values. For example \mbox{@{text "(l : - r)"}} is nullable iff @{term r} is, and so on. The purpose of the record - regular expression is to mark certain parts of a regular expression and - then record in the calculated value which parts of the string were matched - by this part. The label can then serve as classification for the tokens. - For this recall the regular expression @{text "(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\"} for - keywords and identifiers from the Introduction. With the record regular - expression we can form \mbox{@{text "((key : r\<^bsub>key\<^esub>) + (id : r\<^bsub>id\<^esub>))\<^sup>\"}} - and then traverse the calculated value and only collect the underlying - strings in record values. With this we obtain finite sequences of pairs of - labels and strings, for example - - \[@{text "(l\<^sub>1 : s\<^sub>1), ..., (l\<^sub>n : s\<^sub>n)"}\] - - \noindent from which tokens with classifications (keyword-token, - identifier-token and so on) can be extracted. - - -*} - - - -section {* The Correctness Argument by Sulzmann and Lu\label{argu} *} - -text {* -% \newcommand{\greedy}{\succcurlyeq_{gr}} - \newcommand{\posix}{>} - - An extended version of \cite{Sulzmann2014} is available at the website of - its first author; this includes some ``proofs'', claimed in - \cite{Sulzmann2014} to be ``rigorous''. Since these are evidently not in - final form, we make no comment thereon, preferring to give general reasons - for our belief that the approach of \cite{Sulzmann2014} is problematic. - Their central definition is an ``ordering relation'' defined by the - rules (slightly adapted to fit our notation): - - ?? - - \noindent The idea behind the rules (A1) and (A2), for example, is that a - @{text Left}-value is bigger than a @{text Right}-value, if the underlying - string of the @{text Left}-value is longer or of equal length to the - underlying string of the @{text Right}-value. The order is reversed, - however, if the @{text Right}-value can match a longer string than a - @{text Left}-value. In this way the POSIX value is supposed to be the - biggest value for a given string and regular expression. - - Sulzmann and Lu explicitly refer to the paper \cite{Frisch2004} by Frisch - and Cardelli from where they have taken the idea for their correctness - proof. Frisch and Cardelli introduced a similar ordering for GREEDY - matching and they showed that their GREEDY matching algorithm always - produces a maximal element according to this ordering (from all possible - solutions). The only difference between their GREEDY ordering and the - ``ordering'' by Sulzmann and Lu is that GREEDY always prefers a @{text - Left}-value over a @{text Right}-value, no matter what the underlying - string is. This seems to be only a very minor difference, but it has - drastic consequences in terms of what properties both orderings enjoy. - What is interesting for our purposes is that the properties reflexivity, - totality and transitivity for this GREEDY ordering can be proved - relatively easily by induction. -*} - - -section {* Conclusion *} - -text {* - - We have implemented the POSIX value calculation algorithm introduced by - Sulzmann and Lu - \cite{Sulzmann2014}. Our implementation is nearly identical to the - original and all modifications we introduced are harmless (like our char-clause for - @{text inj}). We have proved this algorithm to be correct, but correct - according to our own specification of what POSIX values are. Our - specification (inspired from work by Vansummeren \cite{Vansummeren2006}) appears to be - much simpler than in \cite{Sulzmann2014} and our proofs are nearly always - straightforward. We have attempted to formalise the original proof - by Sulzmann and Lu \cite{Sulzmann2014}, but we believe it contains - unfillable gaps. In the online version of \cite{Sulzmann2014}, the authors - already acknowledge some small problems, but our experience suggests - that there are more serious problems. - - Having proved the correctness of the POSIX lexing algorithm in - \cite{Sulzmann2014}, which lessons have we learned? Well, this is a - perfect example for the importance of the \emph{right} definitions. We - have (on and off) explored mechanisations as soon as first versions - of \cite{Sulzmann2014} appeared, but have made little progress with - turning the relatively detailed proof sketch in \cite{Sulzmann2014} into a - formalisable proof. Having seen \cite{Vansummeren2006} and adapted the - POSIX definition given there for the algorithm by Sulzmann and Lu made all - the difference: the proofs, as said, are nearly straightforward. The - question remains whether the original proof idea of \cite{Sulzmann2014}, - potentially using our result as a stepping stone, can be made to work? - Alas, we really do not know despite considerable effort. - - - Closely related to our work is an automata-based lexer formalised by - Nipkow \cite{Nipkow98}. This lexer also splits up strings into longest - initial substrings, but Nipkow's algorithm is not completely - computational. The algorithm by Sulzmann and Lu, in contrast, can be - implemented with ease in any functional language. A bespoke lexer for the - Imp-language is formalised in Coq as part of the Software Foundations book - by Pierce et al \cite{Pierce2015}. The disadvantage of such bespoke lexers is that they - do not generalise easily to more advanced features. - Our formalisation is available from the Archive of Formal Proofs \cite{aduAFP16} - under \url{http://www.isa-afp.org/entries/Posix-Lexing.shtml}.\medskip - - \noindent - {\bf Acknowledgements:} - We are very grateful to Martin Sulzmann for his comments on our work and - moreover for patiently explaining to us the details in \cite{Sulzmann2014}. We - also received very helpful comments from James Cheney and anonymous referees. - % \small - \bibliographystyle{plain} - \bibliography{root} - -*} (*<*)