thys/Paper/Paper.thy
changeset 133 23e68b81a908
parent 132 03ca57e3f199
child 134 2f043f8be9a9
equal deleted inserted replaced
132:03ca57e3f199 133:23e68b81a908
   939 $\infer{ Stars\,vs_{1} \posix_{r^{\star}} Stars\,vs_{2}} 
   939 $\infer{ Stars\,vs_{1} \posix_{r^{\star}} Stars\,vs_{2}} 
   940        {Stars\,(v :: vs_{1}) \posix_{r^{\star}} Stars\,(v :: vs_{2})}(K4)$	
   940        {Stars\,(v :: vs_{1}) \posix_{r^{\star}} Stars\,(v :: vs_{2})}(K4)$	
   941 \end{tabular}
   941 \end{tabular}
   942 \end{center}
   942 \end{center}
   943 
   943 
       
   944   \noindent
       
   945   The idea behind the rules $(A1)$ and $(A2)$ is that a $Left$-value is
       
   946   bigger than a $Right$-value, if the underlying string of the $Left$-value is
       
   947   bigger or equals to the underlying string of the $Right$-value. The order
       
   948   is reversed if the $Right$-value can match longer string than a $Left$-value.
       
   949   In this way the POSIX value is supposed to be the biggest value
       
   950   for a given string and regular expression.
       
   951 
   944   Sulzmann and Lu explicitly refer to the paper \cite{Frisch2004} by Frisch
   952   Sulzmann and Lu explicitly refer to the paper \cite{Frisch2004} by Frisch
   945   and Cardelli from where they have taken their main idea for their
   953   and Cardelli from where they have taken the idea for their correctness
   946   correctness proof of the POSIX value algorithm. Frisch and Cardelli
   954   proof of the POSIX value algorithm. Frisch and Cardelli introduced a
   947   introduced an ordering, written $\greedy$, for values and they show that
   955   similar ordering for GREEDY matching and they show that their greedy
   948   their greedy matching algorithm always produces a maximal element
   956   matching algorithm always produces a maximal element according to this
   949   according to this ordering (from all possible solutions). The only
   957   ordering (from all possible solutions). The only difference between their
   950   difference between their greedy ordering and the ``ordering'' by Sulzmann
   958   GREEDY ordering and the ``ordering'' by Sulzmann and Lu is that GREEDY, if
   951   and Lu is that GREEDY always prefers a $Left$-value over a $Right$-value.
   959   possible, always prefers a $Left$-value over a $Right$-value, no matter
   952   What is interesting for our purposes is that the properties reflexivity,
   960   what the underlying string is. This seems like only a very minor
   953   totality and transitivity for this GREEDY ordering can be proved
   961   difference, but it leads to drastic consequences in terms what
   954   relatively easily by induction.
   962   properties both orderings enjoy. What is interesting for our purposes is
   955 
   963   that the properties reflexivity, totality and transitivity for this GREEDY
   956   These properties of GREEDY, however, do not transfer to POSIX by 
   964   ordering can be proved relatively easily by induction.
   957   Sulzmann and Lu. To start with, transitivity does not hold anymore in the
   965 
   958 ``normal'' formulation, that is:
   966   These properties of GREEDY, however, do not transfer to the POSIX
   959 
   967   ``ordering'' by Sulzmann and Lu. To start with, $v \geq_r v′$ is not
   960 \begin{property}
   968   defined inductively, but as $v = v′$ or $v >_r v′ \wedge |v| = |v′|$. This
       
   969   means that $v >_r v′$ does not imply $v \geq_r v′$. Moreover, transitivity
       
   970   does not hold in the ``usual'' formulation, for example:
       
   971 
       
   972 \begin{proposition}
   961 Suppose $v_1 : r$, $v_2 : r$ and $v_3 : r$.
   973 Suppose $v_1 : r$, $v_2 : r$ and $v_3 : r$.
   962 If $v_1 \posix_r v_2$ and $v_2 \posix_r v_3$
   974 If $v_1 \posix_r v_2$ and $v_2 \posix_r v_3$
   963 then $v_1 \posix_r v_3$.
   975 then $v_1 \posix_r v_3$.
   964 \end{property}
   976 \end{proposition}
   965 
   977 
   966 \noindent If formulated like this, then there are various counter examples:
   978 \noindent If formulated like this, then there are various counter examples:
   967 Suppose $r$ is $a + ((a + a)(a + \textbf{0}))$ then the $v_1$, $v_2$ and $v_3$
   979 FOr example let $r$ be $a + ((a + a)\cdot (a + \textbf{0}))$ then the $v_1$,
   968 below are values of $r$:
   980 $v_2$ and $v_3$ below are values of $r$:
   969 
   981 
   970 \begin{center}
   982 \begin{center}
   971 \begin{tabular}{lcl}
   983 \begin{tabular}{lcl}
   972  $v_1$ & $=$ & $Left(Char\;a)$\\
   984  $v_1$ & $=$ & $Left(Char\;a)$\\
   973  $v_2$ & $=$ & $Right((Left(Char\;a), Right(Void)))$\\
   985  $v_2$ & $=$ & $Right((Left(Char\;a), Right(Void)))$\\
   983 example actually $v_3 \posix_r v_1$!
   995 example actually $v_3 \posix_r v_1$!
   984 
   996 
   985 Sulzmann and Lu ``fix'' this problem by weakening the
   997 Sulzmann and Lu ``fix'' this problem by weakening the
   986 transitivity property. They require in addition that the
   998 transitivity property. They require in addition that the
   987 underlying strings are of the same length. This excludes the
   999 underlying strings are of the same length. This excludes the
   988 counter example above and any counter-example we could find
  1000 counter example above and any counter-example we were able 
   989 with our implementation. Thus the transitivity lemma in
  1001 to find (by hand and by machine). Thus the transitivity 
   990 \cite{Sulzmann2014} is:
  1002 lemma in should be formulated as:
   991 
  1003 
   992 \begin{property}
  1004 \begin{property}
   993 Suppose $v_1 : r$, $v_2 : r$ and 
  1005 Suppose $v_1 : r$, $v_2 : r$ and 
   994 $v_3 : r$, and also $|v_1|=|v_2|=|v_3|$.\\
  1006 $v_3 : r$, and also $|v_1|=|v_2|=|v_3|$.\\
   995 If $v_1 \posix_r v_2$ and $v_2 \posix_r v_3$
  1007 If $v_1 \posix_r v_2$ and $v_2 \posix_r v_3$
   996 then $v_1 \posix_r v_3$.
  1008 then $v_1 \posix_r v_3$.
   997 \end{property}
  1009 \end{property}
   998 
  1010 
   999 \noindent While we agree with Sulzmann and Lu that this
  1011 \noindent While we agree with Sulzmann and Lu that this property probably
  1000 property probably holds, proving it seems not so
  1012 holds, proving it seems not so straightforward: although one begins with the
  1001 straightforward. Sulzmann and Lu do not give an explicit proof
  1013 assumption that the values have the same flattening, this
       
  1014 cannot be maintained as one descends into the induction. This is 
       
  1015 a problem that occurs in a number of places in their proof.
       
  1016 Sulzmann and Lu do not give an explicit proof
  1002 of the transitivity property, but give a closely related
  1017 of the transitivity property, but give a closely related
  1003 property about the existence of maximal elements. They state
  1018 property about the existence of maximal elements. They state
  1004 that this can be verified by an induction on $r$. We disagree
  1019 that this can be verified by an induction on $r$. We disagree
  1005 with this as we shall show next in case of transitivity.
  1020 with this as we shall show next in case of transitivity.
  1006 
  1021 
  1007 The case where the reasoning breaks down is the sequence case,
  1022 The case where the reasoning breaks down is the sequence case,
  1008 say $r_1\,r_2$. The induction hypotheses in this case
  1023 say $r_1\cdot r_2$. The induction hypotheses in this case
  1009 are
  1024 are
  1010 
  1025 
  1011 \begin{center}
  1026 \begin{center}
  1012 \begin{tabular}{@ {}cc@ {}}
  1027 \begin{tabular}{@ {}cc@ {}}
  1013 \begin{tabular}{@ {}ll@ {}}
  1028 \begin{tabular}{@ {}ll@ {}}
  1035 
  1050 
  1036 \noindent
  1051 \noindent
  1037 We can assume that 
  1052 We can assume that 
  1038 
  1053 
  1039 \begin{equation}
  1054 \begin{equation}
  1040 Seq\,v_{1l}\, v_{1r}) \posix_{r_1\,r_2} Seq\,v_{2l}\, v_{2r})
  1055 Seq\,v_{1l}\, v_{1r} \posix_{r_1\cdot r_2} Seq\,v_{2l}\, v_{2r}
  1041 \qquad\textrm{and}\qquad
  1056 \qquad\textrm{and}\qquad
  1042 Seq\,v_{2l}\, v_{2r}) \posix_{r_1\,r_2} Seq\,v_{3l}\, v_{3r})
  1057 Seq\,v_{2l}\, v_{2r} \posix_{r_1\cdot r_2} Seq\,v_{3l}\, v_{3r}
  1043 \label{assms}
  1058 \label{assms}
  1044 \end{equation}
  1059 \end{equation}
  1045 
  1060 
  1046 \noindent
  1061 \noindent
  1047 hold, and furthermore that the values have equal length, 
  1062 hold, and furthermore that the values have equal length, 
  1056 
  1071 
  1057 \noindent
  1072 \noindent
  1058 We need to show that
  1073 We need to show that
  1059 
  1074 
  1060 \[
  1075 \[
  1061 (v_{1l}, v_{1r}) \posix_{r_1\,r_2} (v_{3l}, v_{3r})
  1076 Seq\,v_{1l}\, v_{1r}) \posix_{r_1\cdot r_2} Seq\, v_{3l}\, v_{3r}
  1062 \]
  1077 \]
  1063 
  1078 
  1064 \noindent holds. We can proceed by analysing how the
  1079 \noindent holds. We can proceed by analysing how the
  1065 assumptions in \eqref{assms} have arisen. There are four
  1080 assumptions in \eqref{assms} have arisen. There are four
  1066 cases. Let us assume we are in the case where
  1081 cases. Let us assume we are in the case where
  1070 v_{1l} \posix_{r_1} v_{2l}
  1085 v_{1l} \posix_{r_1} v_{2l}
  1071 \qquad\textrm{and}\qquad
  1086 \qquad\textrm{and}\qquad
  1072 v_{2l} \posix_{r_1} v_{3l}
  1087 v_{2l} \posix_{r_1} v_{3l}
  1073 \]
  1088 \]
  1074 
  1089 
  1075 \noindent and also know the corresponding typing judgements.
  1090 \noindent and also know the corresponding inhabitation judgements.
  1076 This is exactly a case where we would like to apply the
  1091 This is exactly a case where we would like to apply the
  1077 induction hypothesis IH~$r_1$. But we cannot! We still need to
  1092 induction hypothesis IH~$r_1$. But we cannot! We still need to
  1078 show that $|v_{1l}| = |v_{2l}|$ and $|v_{2l}| = |v_{3l}|$. We
  1093 show that $|v_{1l}| = |v_{2l}|$ and $|v_{2l}| = |v_{3l}|$. We
  1079 know from \eqref{lens} that the lengths of the sequence values
  1094 know from \eqref{lens} that the lengths of the sequence values
  1080 are equal, but from this we cannot infer anything about the
  1095 are equal, but from this we cannot infer anything about the
  1086 \qquad\textrm{and}\qquad
  1101 \qquad\textrm{and}\qquad
  1087 |v_{1r}| \not= |v_{2r}|
  1102 |v_{1r}| \not= |v_{2r}|
  1088 \]
  1103 \]
  1089 
  1104 
  1090 \noindent but still \eqref{lens} will hold. Now we are stuck,
  1105 \noindent but still \eqref{lens} will hold. Now we are stuck,
  1091 since the IH does not apply. 
  1106 since the IH does not apply. As said, this problem where the
  1092 
  1107 induction hypothesis does not apply arises in several places in
       
  1108 the proof of Sulzmann and LU, not just when proving transitivity.
  1093 
  1109 
  1094 *}
  1110 *}
  1095 
  1111 
  1096 section {* Conclusion *}
  1112 section {* Conclusion *}
  1097 
  1113 
  1098 
       
  1099 text {*
  1114 text {*
  1100    Nipkow lexer from 2000
  1115 
  1101 
  1116   We have implemented the POSIX value calculation algorithm introduced in
  1102   \noindent
  1117   \cite{Sulzmann2014}. Our implementation is nearly identical to the
  1103   We have also introduced a slightly restricted version of this relation
  1118   original and all modifications are harmless (like our char-clause for
  1104   where the last rule is restricted so that @{term "flat v \<noteq> []"}.
  1119   @{text inj}). We have proved this algorithm to be correct, but correct
  1105   \bigskip
  1120   according to our own specification of what POSIX values are. Our
  1106 
  1121   specification (inspired from work in \cite{Vansummeren2006}) appears to be
  1107 
  1122   much simpler than in \cite{Sulzmann2014} and our proofs are nearly always
  1108 *}
  1123   straightforward. We have attempted to formalise the original proof
  1109 
  1124   by Sulzmann and Lu \cite{Sulzmann2014}, but we believe it contains
  1110 
  1125   unfillable gaps. In the online version of \cite{Sulzmann2014}, the authors
  1111 text {*
  1126   already acknowledge some small problems, but our experience suggests
       
  1127   there are more serious problems. 
       
  1128   
       
  1129   Closesly related to our work is an automata-based lexer formalised by
       
  1130   Nipkow \cite{Nipkow98}. This lexer also splits up strings into longest
       
  1131   initial substrings, but Nipkow's algorithm is not completely
       
  1132   computational. The algorithm by Sulzmann and Lu, in contrast, can be
       
  1133   implemented easily in functional languages. A bespoke lexer for the
       
  1134   Imp-Language is formalised in Coq as part of the Software Foundations book
       
  1135   \cite{Pierce2015}. The disadvantage of such bespoke lexers is that they
       
  1136   do not generalise easily to more advanced features.
       
  1137   Our formalisation is available from
       
  1138 
       
  1139   \begin{center}
       
  1140   \url{http://www.inf.kcl.ac.uk/staff/urbanc/lex}
       
  1141   \end{center}
       
  1142 
  1112   %\noindent
  1143   %\noindent
  1113   %{\bf Acknowledgements:}
  1144   %{\bf Acknowledgements:}
  1114   %We are grateful for the comments we received from anonymous
  1145   %We are grateful for the comments we received from anonymous
  1115   %referees.
  1146   %referees.
  1116 
       
  1117   \bibliographystyle{plain}
  1147   \bibliographystyle{plain}
  1118   \bibliography{root}
  1148   \bibliography{root}
  1119 
  1149 
  1120 *}
  1150 *}
  1121 
  1151