lexing: comparison thys/Journal/Paper.thy

equal deleted inserted replaced

-:f754a10875c7
+:c840a99a3e05
 (*<*)
 theory Paper
 imports
 "../Lexer"
 "../Simplifying"
 "../Positions"
 "~~/src/HOL/Library/LaTeXsugar"
 begin
 lemma Suc_0_fold:
 "Suc 0 = 1"
 string. The reason for introducing the more restricted version of
 lexical values is convenience later on when reasoning about an
 ordering relation for values.}
 \begin{center}
-\begin{tabular}{c@ {\hspace{12mm}}c}
+\begin{tabular}{c@ {\hspace{12mm}}c}\label{prfintros}
 \\[-8mm]
 @{thm[mode=Axiom] Prf.intros(4)} &
 @{thm[mode=Axiom] Prf.intros(5)[of "c"]}\\[4mm]
 @{thm[mode=Rule] Prf.intros(2)[of "v\<^sub>1" "r\<^sub>1" "r\<^sub>2"]} &
 @{thm[mode=Rule] Prf.intros(3)[of "v\<^sub>2" "r\<^sub>1" "r\<^sub>2"]}\\[4mm]
 \end{proof}
 *}
-section {* Extensions*}
-text {*
-A strong point in favour of
-Sulzmann and Lu's algorithm is that it can be extended in various
-ways.  If we are interested in tokenising a string, then we need to not just
-split up the string into tokens, but also ``classify'' the tokens (for
-example whether it is a keyword or an identifier). This can be done with
-only minor modifications to the algorithm by introducing \emph{record
-regular expressions} and \emph{record values} (for example
-\cite{Sulzmann2014b}):
-\begin{center}
-@{text "r :="}
-@{text "..."} $\mid$
-@{text "(l : r)"} \qquad\qquad
-@{text "v :="}
-@{text "..."} $\mid$
-@{text "(l : v)"}
-\end{center}
-\noindent where @{text l} is a label, say a string, @{text r} a regular
-expression and @{text v} a value. All functions can be smoothly extended
-to these regular expressions and values. For example \mbox{@{text "(l :
-r)"}} is nullable iff @{term r} is, and so on. The purpose of the record
-regular expression is to mark certain parts of a regular expression and
-then record in the calculated value which parts of the string were matched
-by this part. The label can then serve as classification for the tokens.
-For this recall the regular expression @{text "(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\<star>"} for
-keywords and identifiers from the Introduction. With the record regular
-expression we can form \mbox{@{text "((key : r\<^bsub>key\<^esub>) + (id : r\<^bsub>id\<^esub>))\<^sup>\<star>"}}
-and then traverse the calculated value and only collect the underlying
-strings in record values. With this we obtain finite sequences of pairs of
-labels and strings, for example
-\[@{text "(l\<^sub>1 : s\<^sub>1), ..., (l\<^sub>n : s\<^sub>n)"}\]
-\noindent from which tokens with classifications (keyword-token,
-identifier-token and so on) can be extracted.
-*}
-section {* The Correctness Argument by Sulzmann and Lu\label{argu} *}
-text {*
-%  \newcommand{\greedy}{\succcurlyeq_{gr}}
-\newcommand{\posix}{>}
-An extended version of \cite{Sulzmann2014} is available at the website of
-its first author; this includes some ``proofs'', claimed in
-\cite{Sulzmann2014} to be ``rigorous''. Since these are evidently not in
-final form, we make no comment thereon, preferring to give general reasons
-for our belief that the approach of \cite{Sulzmann2014} is problematic.
-Their central definition is an ``ordering relation'' defined by the
-rules (slightly adapted to fit our notation):
-??
-\noindent The idea behind the rules (A1) and (A2), for example, is that a
-@{text Left}-value is bigger than a @{text Right}-value, if the underlying
-string of the @{text Left}-value is longer or of equal length to the
-underlying string of the @{text Right}-value. The order is reversed,
-however, if the @{text Right}-value can match a longer string than a
-@{text Left}-value. In this way the POSIX value is supposed to be the
-biggest value for a given string and regular expression.
-Sulzmann and Lu explicitly refer to the paper \cite{Frisch2004} by Frisch
-and Cardelli from where they have taken the idea for their correctness
-proof. Frisch and Cardelli introduced a similar ordering for GREEDY
-matching and they showed that their GREEDY matching algorithm always
-produces a maximal element according to this ordering (from all possible
-solutions). The only difference between their GREEDY ordering and the
-``ordering'' by Sulzmann and Lu is that GREEDY always prefers a @{text
-Left}-value over a @{text Right}-value, no matter what the underlying
-string is. This seems to be only a very minor difference, but it has
-drastic consequences in terms of what properties both orderings enjoy.
-What is interesting for our purposes is that the properties reflexivity,
-totality and transitivity for this GREEDY ordering can be proved
-relatively easily by induction.
-*}
-section {* Conclusion *}
-text {*
-We have implemented the POSIX value calculation algorithm introduced by
-Sulzmann and Lu
-\cite{Sulzmann2014}. Our implementation is nearly identical to the
-original and all modifications we introduced are harmless (like our char-clause for
-@{text inj}). We have proved this algorithm to be correct, but correct
-according to our own specification of what POSIX values are. Our
-specification (inspired from work by Vansummeren \cite{Vansummeren2006}) appears to be
-much simpler than in \cite{Sulzmann2014} and our proofs are nearly always
-straightforward. We have attempted to formalise the original proof
-by Sulzmann and Lu \cite{Sulzmann2014}, but we believe it contains
-unfillable gaps. In the online version of \cite{Sulzmann2014}, the authors
-already acknowledge some small problems, but our experience suggests
-that there are more serious problems.
-Having proved the correctness of the POSIX lexing algorithm in
-\cite{Sulzmann2014}, which lessons have we learned? Well, this is a
-perfect example for the importance of the \emph{right} definitions. We
-have (on and off) explored mechanisations as soon as first versions
-of \cite{Sulzmann2014} appeared, but have made little progress with
-turning the relatively detailed proof sketch in \cite{Sulzmann2014} into a
-formalisable proof. Having seen \cite{Vansummeren2006} and adapted the
-POSIX definition given there for the algorithm by Sulzmann and Lu made all
-the difference: the proofs, as said, are nearly straightforward. The
-question remains whether the original proof idea of \cite{Sulzmann2014},
-potentially using our result as a stepping stone, can be made to work?
-Alas, we really do not know despite considerable effort.
-Closely related to our work is an automata-based lexer formalised by
-Nipkow \cite{Nipkow98}. This lexer also splits up strings into longest
-initial substrings, but Nipkow's algorithm is not completely
-computational. The algorithm by Sulzmann and Lu, in contrast, can be
-implemented with ease in any functional language. A bespoke lexer for the
-Imp-language is formalised in Coq as part of the Software Foundations book
-by Pierce et al \cite{Pierce2015}. The disadvantage of such bespoke lexers is that they
-do not generalise easily to more advanced features.
-Our formalisation is available from the Archive of Formal Proofs \cite{aduAFP16}
-under \url{http://www.isa-afp.org/entries/Posix-Lexing.shtml}.\medskip
-\noindent
-{\bf Acknowledgements:}
-We are very grateful to Martin Sulzmann for his comments on our work and
-moreover for patiently explaining to us the details in \cite{Sulzmann2014}. We
-also received very helpful comments from James Cheney and anonymous referees.
-%  \small
-\bibliographystyle{plain}
-\bibliography{root}
-*}
 (*<*)
 end
 (*>*)

changeset 280	c840a99a3e05
parent 275	deea42c83c9e
child 287	95b3880d428f