diff -r 03ca57e3f199 -r 23e68b81a908 thys/Paper/Paper.thy --- a/thys/Paper/Paper.thy Tue Mar 08 09:25:17 2016 +0000 +++ b/thys/Paper/Paper.thy Tue Mar 08 10:21:18 2016 +0000 @@ -941,31 +941,43 @@ \end{tabular} \end{center} + \noindent + The idea behind the rules $(A1)$ and $(A2)$ is that a $Left$-value is + bigger than a $Right$-value, if the underlying string of the $Left$-value is + bigger or equals to the underlying string of the $Right$-value. The order + is reversed if the $Right$-value can match longer string than a $Left$-value. + In this way the POSIX value is supposed to be the biggest value + for a given string and regular expression. + Sulzmann and Lu explicitly refer to the paper \cite{Frisch2004} by Frisch - and Cardelli from where they have taken their main idea for their - correctness proof of the POSIX value algorithm. Frisch and Cardelli - introduced an ordering, written $\greedy$, for values and they show that - their greedy matching algorithm always produces a maximal element - according to this ordering (from all possible solutions). The only - difference between their greedy ordering and the ``ordering'' by Sulzmann - and Lu is that GREEDY always prefers a $Left$-value over a $Right$-value. - What is interesting for our purposes is that the properties reflexivity, - totality and transitivity for this GREEDY ordering can be proved - relatively easily by induction. + and Cardelli from where they have taken the idea for their correctness + proof of the POSIX value algorithm. Frisch and Cardelli introduced a + similar ordering for GREEDY matching and they show that their greedy + matching algorithm always produces a maximal element according to this + ordering (from all possible solutions). The only difference between their + GREEDY ordering and the ``ordering'' by Sulzmann and Lu is that GREEDY, if + possible, always prefers a $Left$-value over a $Right$-value, no matter + what the underlying string is. This seems like only a very minor + difference, but it leads to drastic consequences in terms what + properties both orderings enjoy. What is interesting for our purposes is + that the properties reflexivity, totality and transitivity for this GREEDY + ordering can be proved relatively easily by induction. - These properties of GREEDY, however, do not transfer to POSIX by - Sulzmann and Lu. To start with, transitivity does not hold anymore in the -``normal'' formulation, that is: + These properties of GREEDY, however, do not transfer to the POSIX + ``ordering'' by Sulzmann and Lu. To start with, $v \geq_r v′$ is not + defined inductively, but as $v = v′$ or $v >_r v′ \wedge |v| = |v′|$. This + means that $v >_r v′$ does not imply $v \geq_r v′$. Moreover, transitivity + does not hold in the ``usual'' formulation, for example: -\begin{property} +\begin{proposition} Suppose $v_1 : r$, $v_2 : r$ and $v_3 : r$. If $v_1 \posix_r v_2$ and $v_2 \posix_r v_3$ then $v_1 \posix_r v_3$. -\end{property} +\end{proposition} \noindent If formulated like this, then there are various counter examples: -Suppose $r$ is $a + ((a + a)(a + \textbf{0}))$ then the $v_1$, $v_2$ and $v_3$ -below are values of $r$: +FOr example let $r$ be $a + ((a + a)\cdot (a + \textbf{0}))$ then the $v_1$, +$v_2$ and $v_3$ below are values of $r$: \begin{center} \begin{tabular}{lcl} @@ -985,9 +997,9 @@ Sulzmann and Lu ``fix'' this problem by weakening the transitivity property. They require in addition that the underlying strings are of the same length. This excludes the -counter example above and any counter-example we could find -with our implementation. Thus the transitivity lemma in -\cite{Sulzmann2014} is: +counter example above and any counter-example we were able +to find (by hand and by machine). Thus the transitivity +lemma in should be formulated as: \begin{property} Suppose $v_1 : r$, $v_2 : r$ and @@ -996,16 +1008,19 @@ then $v_1 \posix_r v_3$. \end{property} -\noindent While we agree with Sulzmann and Lu that this -property probably holds, proving it seems not so -straightforward. Sulzmann and Lu do not give an explicit proof +\noindent While we agree with Sulzmann and Lu that this property probably +holds, proving it seems not so straightforward: although one begins with the +assumption that the values have the same flattening, this +cannot be maintained as one descends into the induction. This is +a problem that occurs in a number of places in their proof. +Sulzmann and Lu do not give an explicit proof of the transitivity property, but give a closely related property about the existence of maximal elements. They state that this can be verified by an induction on $r$. We disagree with this as we shall show next in case of transitivity. The case where the reasoning breaks down is the sequence case, -say $r_1\,r_2$. The induction hypotheses in this case +say $r_1\cdot r_2$. The induction hypotheses in this case are \begin{center} @@ -1037,9 +1052,9 @@ We can assume that \begin{equation} -Seq\,v_{1l}\, v_{1r}) \posix_{r_1\,r_2} Seq\,v_{2l}\, v_{2r}) +Seq\,v_{1l}\, v_{1r} \posix_{r_1\cdot r_2} Seq\,v_{2l}\, v_{2r} \qquad\textrm{and}\qquad -Seq\,v_{2l}\, v_{2r}) \posix_{r_1\,r_2} Seq\,v_{3l}\, v_{3r}) +Seq\,v_{2l}\, v_{2r} \posix_{r_1\cdot r_2} Seq\,v_{3l}\, v_{3r} \label{assms} \end{equation} @@ -1058,7 +1073,7 @@ We need to show that \[ -(v_{1l}, v_{1r}) \posix_{r_1\,r_2} (v_{3l}, v_{3r}) +Seq\,v_{1l}\, v_{1r}) \posix_{r_1\cdot r_2} Seq\, v_{3l}\, v_{3r} \] \noindent holds. We can proceed by analysing how the @@ -1072,7 +1087,7 @@ v_{2l} \posix_{r_1} v_{3l} \] -\noindent and also know the corresponding typing judgements. +\noindent and also know the corresponding inhabitation judgements. This is exactly a case where we would like to apply the induction hypothesis IH~$r_1$. But we cannot! We still need to show that $|v_{1l}| = |v_{2l}|$ and $|v_{2l}| = |v_{3l}|$. We @@ -1088,32 +1103,47 @@ \] \noindent but still \eqref{lens} will hold. Now we are stuck, -since the IH does not apply. - +since the IH does not apply. As said, this problem where the +induction hypothesis does not apply arises in several places in +the proof of Sulzmann and LU, not just when proving transitivity. *} section {* Conclusion *} - text {* - Nipkow lexer from 2000 - \noindent - We have also introduced a slightly restricted version of this relation - where the last rule is restricted so that @{term "flat v \ []"}. - \bigskip + We have implemented the POSIX value calculation algorithm introduced in + \cite{Sulzmann2014}. Our implementation is nearly identical to the + original and all modifications are harmless (like our char-clause for + @{text inj}). We have proved this algorithm to be correct, but correct + according to our own specification of what POSIX values are. Our + specification (inspired from work in \cite{Vansummeren2006}) appears to be + much simpler than in \cite{Sulzmann2014} and our proofs are nearly always + straightforward. We have attempted to formalise the original proof + by Sulzmann and Lu \cite{Sulzmann2014}, but we believe it contains + unfillable gaps. In the online version of \cite{Sulzmann2014}, the authors + already acknowledge some small problems, but our experience suggests + there are more serious problems. + + Closesly related to our work is an automata-based lexer formalised by + Nipkow \cite{Nipkow98}. This lexer also splits up strings into longest + initial substrings, but Nipkow's algorithm is not completely + computational. The algorithm by Sulzmann and Lu, in contrast, can be + implemented easily in functional languages. A bespoke lexer for the + Imp-Language is formalised in Coq as part of the Software Foundations book + \cite{Pierce2015}. The disadvantage of such bespoke lexers is that they + do not generalise easily to more advanced features. + Our formalisation is available from - -*} + \begin{center} + \url{http://www.inf.kcl.ac.uk/staff/urbanc/lex} + \end{center} - -text {* %\noindent %{\bf Acknowledgements:} %We are grateful for the comments we received from anonymous %referees. - \bibliographystyle{plain} \bibliography{root}