--- a/thys/Journal/Paper.thy Tue Oct 10 11:31:47 2017 +0100
+++ b/thys/Journal/Paper.thy Wed Oct 25 12:18:44 2017 +0100
@@ -3,7 +3,7 @@
imports
"../Lexer"
"../Simplifying"
- "../Positions"
+ "../Positions"
"~~/src/HOL/Library/LaTeXsugar"
begin
@@ -507,7 +507,7 @@
ordering relation for values.}
\begin{center}
- \begin{tabular}{c@ {\hspace{12mm}}c}
+ \begin{tabular}{c@ {\hspace{12mm}}c}\label{prfintros}
\\[-8mm]
@{thm[mode=Axiom] Prf.intros(4)} &
@{thm[mode=Axiom] Prf.intros(5)[of "c"]}\\[4mm]
@@ -1552,145 +1552,7 @@
*}
-section {* Extensions*}
-text {*
-
- A strong point in favour of
- Sulzmann and Lu's algorithm is that it can be extended in various
- ways. If we are interested in tokenising a string, then we need to not just
- split up the string into tokens, but also ``classify'' the tokens (for
- example whether it is a keyword or an identifier). This can be done with
- only minor modifications to the algorithm by introducing \emph{record
- regular expressions} and \emph{record values} (for example
- \cite{Sulzmann2014b}):
-
- \begin{center}
- @{text "r :="}
- @{text "..."} $\mid$
- @{text "(l : r)"} \qquad\qquad
- @{text "v :="}
- @{text "..."} $\mid$
- @{text "(l : v)"}
- \end{center}
-
- \noindent where @{text l} is a label, say a string, @{text r} a regular
- expression and @{text v} a value. All functions can be smoothly extended
- to these regular expressions and values. For example \mbox{@{text "(l :
- r)"}} is nullable iff @{term r} is, and so on. The purpose of the record
- regular expression is to mark certain parts of a regular expression and
- then record in the calculated value which parts of the string were matched
- by this part. The label can then serve as classification for the tokens.
- For this recall the regular expression @{text "(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\<star>"} for
- keywords and identifiers from the Introduction. With the record regular
- expression we can form \mbox{@{text "((key : r\<^bsub>key\<^esub>) + (id : r\<^bsub>id\<^esub>))\<^sup>\<star>"}}
- and then traverse the calculated value and only collect the underlying
- strings in record values. With this we obtain finite sequences of pairs of
- labels and strings, for example
-
- \[@{text "(l\<^sub>1 : s\<^sub>1), ..., (l\<^sub>n : s\<^sub>n)"}\]
-
- \noindent from which tokens with classifications (keyword-token,
- identifier-token and so on) can be extracted.
-
-
-*}
-
-
-
-section {* The Correctness Argument by Sulzmann and Lu\label{argu} *}
-
-text {*
-% \newcommand{\greedy}{\succcurlyeq_{gr}}
- \newcommand{\posix}{>}
-
- An extended version of \cite{Sulzmann2014} is available at the website of
- its first author; this includes some ``proofs'', claimed in
- \cite{Sulzmann2014} to be ``rigorous''. Since these are evidently not in
- final form, we make no comment thereon, preferring to give general reasons
- for our belief that the approach of \cite{Sulzmann2014} is problematic.
- Their central definition is an ``ordering relation'' defined by the
- rules (slightly adapted to fit our notation):
-
- ??
-
- \noindent The idea behind the rules (A1) and (A2), for example, is that a
- @{text Left}-value is bigger than a @{text Right}-value, if the underlying
- string of the @{text Left}-value is longer or of equal length to the
- underlying string of the @{text Right}-value. The order is reversed,
- however, if the @{text Right}-value can match a longer string than a
- @{text Left}-value. In this way the POSIX value is supposed to be the
- biggest value for a given string and regular expression.
-
- Sulzmann and Lu explicitly refer to the paper \cite{Frisch2004} by Frisch
- and Cardelli from where they have taken the idea for their correctness
- proof. Frisch and Cardelli introduced a similar ordering for GREEDY
- matching and they showed that their GREEDY matching algorithm always
- produces a maximal element according to this ordering (from all possible
- solutions). The only difference between their GREEDY ordering and the
- ``ordering'' by Sulzmann and Lu is that GREEDY always prefers a @{text
- Left}-value over a @{text Right}-value, no matter what the underlying
- string is. This seems to be only a very minor difference, but it has
- drastic consequences in terms of what properties both orderings enjoy.
- What is interesting for our purposes is that the properties reflexivity,
- totality and transitivity for this GREEDY ordering can be proved
- relatively easily by induction.
-*}
-
-
-section {* Conclusion *}
-
-text {*
-
- We have implemented the POSIX value calculation algorithm introduced by
- Sulzmann and Lu
- \cite{Sulzmann2014}. Our implementation is nearly identical to the
- original and all modifications we introduced are harmless (like our char-clause for
- @{text inj}). We have proved this algorithm to be correct, but correct
- according to our own specification of what POSIX values are. Our
- specification (inspired from work by Vansummeren \cite{Vansummeren2006}) appears to be
- much simpler than in \cite{Sulzmann2014} and our proofs are nearly always
- straightforward. We have attempted to formalise the original proof
- by Sulzmann and Lu \cite{Sulzmann2014}, but we believe it contains
- unfillable gaps. In the online version of \cite{Sulzmann2014}, the authors
- already acknowledge some small problems, but our experience suggests
- that there are more serious problems.
-
- Having proved the correctness of the POSIX lexing algorithm in
- \cite{Sulzmann2014}, which lessons have we learned? Well, this is a
- perfect example for the importance of the \emph{right} definitions. We
- have (on and off) explored mechanisations as soon as first versions
- of \cite{Sulzmann2014} appeared, but have made little progress with
- turning the relatively detailed proof sketch in \cite{Sulzmann2014} into a
- formalisable proof. Having seen \cite{Vansummeren2006} and adapted the
- POSIX definition given there for the algorithm by Sulzmann and Lu made all
- the difference: the proofs, as said, are nearly straightforward. The
- question remains whether the original proof idea of \cite{Sulzmann2014},
- potentially using our result as a stepping stone, can be made to work?
- Alas, we really do not know despite considerable effort.
-
-
- Closely related to our work is an automata-based lexer formalised by
- Nipkow \cite{Nipkow98}. This lexer also splits up strings into longest
- initial substrings, but Nipkow's algorithm is not completely
- computational. The algorithm by Sulzmann and Lu, in contrast, can be
- implemented with ease in any functional language. A bespoke lexer for the
- Imp-language is formalised in Coq as part of the Software Foundations book
- by Pierce et al \cite{Pierce2015}. The disadvantage of such bespoke lexers is that they
- do not generalise easily to more advanced features.
- Our formalisation is available from the Archive of Formal Proofs \cite{aduAFP16}
- under \url{http://www.isa-afp.org/entries/Posix-Lexing.shtml}.\medskip
-
- \noindent
- {\bf Acknowledgements:}
- We are very grateful to Martin Sulzmann for his comments on our work and
- moreover for patiently explaining to us the details in \cite{Sulzmann2014}. We
- also received very helpful comments from James Cheney and anonymous referees.
- % \small
- \bibliographystyle{plain}
- \bibliography{root}
-
-*}
(*<*)