lexing: changeset 133:23e68b81a908

--- a/thys/Paper/Paper.thy	Tue Mar 08 09:25:17 2016 +0000
+++ b/thys/Paper/Paper.thy	Tue Mar 08 10:21:18 2016 +0000
@@ -941,31 +941,43 @@
 \end{tabular}
 \end{center}
 
+  \noindent
+  The idea behind the rules $(A1)$ and $(A2)$ is that a $Left$-value is
+  bigger than a $Right$-value, if the underlying string of the $Left$-value is
+  bigger or equals to the underlying string of the $Right$-value. The order
+  is reversed if the $Right$-value can match longer string than a $Left$-value.
+  In this way the POSIX value is supposed to be the biggest value
+  for a given string and regular expression.
+
   Sulzmann and Lu explicitly refer to the paper \cite{Frisch2004} by Frisch
-  and Cardelli from where they have taken their main idea for their
-  correctness proof of the POSIX value algorithm. Frisch and Cardelli
-  introduced an ordering, written $\greedy$, for values and they show that
-  their greedy matching algorithm always produces a maximal element
-  according to this ordering (from all possible solutions). The only
-  difference between their greedy ordering and the ``ordering'' by Sulzmann
-  and Lu is that GREEDY always prefers a $Left$-value over a $Right$-value.
-  What is interesting for our purposes is that the properties reflexivity,
-  totality and transitivity for this GREEDY ordering can be proved
-  relatively easily by induction.
+  and Cardelli from where they have taken the idea for their correctness
+  proof of the POSIX value algorithm. Frisch and Cardelli introduced a
+  similar ordering for GREEDY matching and they show that their greedy
+  matching algorithm always produces a maximal element according to this
+  ordering (from all possible solutions). The only difference between their
+  GREEDY ordering and the ``ordering'' by Sulzmann and Lu is that GREEDY, if
+  possible, always prefers a $Left$-value over a $Right$-value, no matter
+  what the underlying string is. This seems like only a very minor
+  difference, but it leads to drastic consequences in terms what
+  properties both orderings enjoy. What is interesting for our purposes is
+  that the properties reflexivity, totality and transitivity for this GREEDY
+  ordering can be proved relatively easily by induction.
 
-  These properties of GREEDY, however, do not transfer to POSIX by 
-  Sulzmann and Lu. To start with, transitivity does not hold anymore in the
-``normal'' formulation, that is:
+  These properties of GREEDY, however, do not transfer to the POSIX
+  ``ordering'' by Sulzmann and Lu. To start with, $v \geq_r v′$ is not
+  defined inductively, but as $v = v′$ or $v >_r v′ \wedge |v| = |v′|$. This
+  means that $v >_r v′$ does not imply $v \geq_r v′$. Moreover, transitivity
+  does not hold in the ``usual'' formulation, for example:
 
-\begin{property}
+\begin{proposition}
 Suppose $v_1 : r$, $v_2 : r$ and $v_3 : r$.
 If $v_1 \posix_r v_2$ and $v_2 \posix_r v_3$
 then $v_1 \posix_r v_3$.
-\end{property}
+\end{proposition}
 
 \noindent If formulated like this, then there are various counter examples:
-Suppose $r$ is $a + ((a + a)(a + \textbf{0}))$ then the $v_1$, $v_2$ and $v_3$
-below are values of $r$:
+FOr example let $r$ be $a + ((a + a)\cdot (a + \textbf{0}))$ then the $v_1$,
+$v_2$ and $v_3$ below are values of $r$:
 
 \begin{center}
 \begin{tabular}{lcl}
@@ -985,9 +997,9 @@
 Sulzmann and Lu ``fix'' this problem by weakening the
 transitivity property. They require in addition that the
 underlying strings are of the same length. This excludes the
-counter example above and any counter-example we could find
-with our implementation. Thus the transitivity lemma in
-\cite{Sulzmann2014} is:
+counter example above and any counter-example we were able 
+to find (by hand and by machine). Thus the transitivity 
+lemma in should be formulated as:
 
 \begin{property}
 Suppose $v_1 : r$, $v_2 : r$ and 
@@ -996,16 +1008,19 @@
 then $v_1 \posix_r v_3$.
 \end{property}
 
-\noindent While we agree with Sulzmann and Lu that this
-property probably holds, proving it seems not so
-straightforward. Sulzmann and Lu do not give an explicit proof
+\noindent While we agree with Sulzmann and Lu that this property probably
+holds, proving it seems not so straightforward: although one begins with the
+assumption that the values have the same flattening, this
+cannot be maintained as one descends into the induction. This is 
+a problem that occurs in a number of places in their proof.
+Sulzmann and Lu do not give an explicit proof
 of the transitivity property, but give a closely related
 property about the existence of maximal elements. They state
 that this can be verified by an induction on $r$. We disagree
 with this as we shall show next in case of transitivity.
 
 The case where the reasoning breaks down is the sequence case,
-say $r_1\,r_2$. The induction hypotheses in this case
+say $r_1\cdot r_2$. The induction hypotheses in this case
 are
 
 \begin{center}
@@ -1037,9 +1052,9 @@
 We can assume that 
 
 \begin{equation}
-Seq\,v_{1l}\, v_{1r}) \posix_{r_1\,r_2} Seq\,v_{2l}\, v_{2r})
+Seq\,v_{1l}\, v_{1r} \posix_{r_1\cdot r_2} Seq\,v_{2l}\, v_{2r}
 \qquad\textrm{and}\qquad
-Seq\,v_{2l}\, v_{2r}) \posix_{r_1\,r_2} Seq\,v_{3l}\, v_{3r})
+Seq\,v_{2l}\, v_{2r} \posix_{r_1\cdot r_2} Seq\,v_{3l}\, v_{3r}
 \label{assms}
 \end{equation}
 
@@ -1058,7 +1073,7 @@
 We need to show that
 
 \[
-(v_{1l}, v_{1r}) \posix_{r_1\,r_2} (v_{3l}, v_{3r})
+Seq\,v_{1l}\, v_{1r}) \posix_{r_1\cdot r_2} Seq\, v_{3l}\, v_{3r}
 \]
 
 \noindent holds. We can proceed by analysing how the
@@ -1072,7 +1087,7 @@
 v_{2l} \posix_{r_1} v_{3l}
 \]
 
-\noindent and also know the corresponding typing judgements.
+\noindent and also know the corresponding inhabitation judgements.
 This is exactly a case where we would like to apply the
 induction hypothesis IH~$r_1$. But we cannot! We still need to
 show that $|v_{1l}| = |v_{2l}|$ and $|v_{2l}| = |v_{3l}|$. We
@@ -1088,32 +1103,47 @@
 \]
 
 \noindent but still \eqref{lens} will hold. Now we are stuck,
-since the IH does not apply. 
-
+since the IH does not apply. As said, this problem where the
+induction hypothesis does not apply arises in several places in
+the proof of Sulzmann and LU, not just when proving transitivity.
 
 *}
 
 section {* Conclusion *}
 
-
 text {*
-   Nipkow lexer from 2000
 
-  \noindent
-  We have also introduced a slightly restricted version of this relation
-  where the last rule is restricted so that @{term "flat v \<noteq> []"}.
-  \bigskip
+  We have implemented the POSIX value calculation algorithm introduced in
+  \cite{Sulzmann2014}. Our implementation is nearly identical to the
+  original and all modifications are harmless (like our char-clause for
+  @{text inj}). We have proved this algorithm to be correct, but correct
+  according to our own specification of what POSIX values are. Our
+  specification (inspired from work in \cite{Vansummeren2006}) appears to be
+  much simpler than in \cite{Sulzmann2014} and our proofs are nearly always
+  straightforward. We have attempted to formalise the original proof
+  by Sulzmann and Lu \cite{Sulzmann2014}, but we believe it contains
+  unfillable gaps. In the online version of \cite{Sulzmann2014}, the authors
+  already acknowledge some small problems, but our experience suggests
+  there are more serious problems. 
+  
+  Closesly related to our work is an automata-based lexer formalised by
+  Nipkow \cite{Nipkow98}. This lexer also splits up strings into longest
+  initial substrings, but Nipkow's algorithm is not completely
+  computational. The algorithm by Sulzmann and Lu, in contrast, can be
+  implemented easily in functional languages. A bespoke lexer for the
+  Imp-Language is formalised in Coq as part of the Software Foundations book
+  \cite{Pierce2015}. The disadvantage of such bespoke lexers is that they
+  do not generalise easily to more advanced features.
+  Our formalisation is available from
 
-
-*}
+  \begin{center}
+  \url{http://www.inf.kcl.ac.uk/staff/urbanc/lex}
+  \end{center}
 
-
-text {*
   %\noindent
   %{\bf Acknowledgements:}
   %We are grateful for the comments we received from anonymous
   %referees.
-
   \bibliographystyle{plain}
   \bibliography{root}

--- a/thys/Paper/document/root.bib	Tue Mar 08 09:25:17 2016 +0000
+++ b/thys/Paper/document/root.bib	Tue Mar 08 10:21:18 2016 +0000
@@ -20,6 +20,16 @@
   series =    {LNCS}
 }
 
+@book{Pierce2015,
+  author = {B.~C.~Pierce and C.~Casinghino and M.~Gaboardi and
+            M.~Greenberg and C.~Hri\c{t}cu and 
+            V.~Sjoberg and B.~Yorgey},
+  title = {{S}oftware {F}oundations},
+  year = {2015},
+  publisher = {Electronic textbook},
+  note = {\url{http://www.cis.upenn.edu/~bcpierce/sf}}
+}
+
 @Misc{Kuklewicz,
   author = 	 {C.~Kuklewicz},
   title = 	 {{R}egex {P}osix},

Binary file thys/paper.pdf has changed

author	Christian Urban <christian dot urban at kcl dot ac dot uk>
	Tue, 08 Mar 2016 10:21:18 +0000
changeset 133	23e68b81a908
parent 132	03ca57e3f199
child 134	2f043f8be9a9

thys/Paper/Paper.thy		file \| annotate \| diff \| comparison \| revisions
thys/Paper/document/root.bib		file \| annotate \| diff \| comparison \| revisions
thys/paper.pdf		file \| annotate \| diff \| comparison \| revisions