939 $\infer{ Stars\,vs_{1} \posix_{r^{\star}} Stars\,vs_{2}} |
939 $\infer{ Stars\,vs_{1} \posix_{r^{\star}} Stars\,vs_{2}} |
940 {Stars\,(v :: vs_{1}) \posix_{r^{\star}} Stars\,(v :: vs_{2})}(K4)$ |
940 {Stars\,(v :: vs_{1}) \posix_{r^{\star}} Stars\,(v :: vs_{2})}(K4)$ |
941 \end{tabular} |
941 \end{tabular} |
942 \end{center} |
942 \end{center} |
943 |
943 |
|
944 \noindent |
|
945 The idea behind the rules $(A1)$ and $(A2)$ is that a $Left$-value is |
|
946 bigger than a $Right$-value, if the underlying string of the $Left$-value is |
|
947 bigger or equals to the underlying string of the $Right$-value. The order |
|
948 is reversed if the $Right$-value can match longer string than a $Left$-value. |
|
949 In this way the POSIX value is supposed to be the biggest value |
|
950 for a given string and regular expression. |
|
951 |
944 Sulzmann and Lu explicitly refer to the paper \cite{Frisch2004} by Frisch |
952 Sulzmann and Lu explicitly refer to the paper \cite{Frisch2004} by Frisch |
945 and Cardelli from where they have taken their main idea for their |
953 and Cardelli from where they have taken the idea for their correctness |
946 correctness proof of the POSIX value algorithm. Frisch and Cardelli |
954 proof of the POSIX value algorithm. Frisch and Cardelli introduced a |
947 introduced an ordering, written $\greedy$, for values and they show that |
955 similar ordering for GREEDY matching and they show that their greedy |
948 their greedy matching algorithm always produces a maximal element |
956 matching algorithm always produces a maximal element according to this |
949 according to this ordering (from all possible solutions). The only |
957 ordering (from all possible solutions). The only difference between their |
950 difference between their greedy ordering and the ``ordering'' by Sulzmann |
958 GREEDY ordering and the ``ordering'' by Sulzmann and Lu is that GREEDY, if |
951 and Lu is that GREEDY always prefers a $Left$-value over a $Right$-value. |
959 possible, always prefers a $Left$-value over a $Right$-value, no matter |
952 What is interesting for our purposes is that the properties reflexivity, |
960 what the underlying string is. This seems like only a very minor |
953 totality and transitivity for this GREEDY ordering can be proved |
961 difference, but it leads to drastic consequences in terms what |
954 relatively easily by induction. |
962 properties both orderings enjoy. What is interesting for our purposes is |
955 |
963 that the properties reflexivity, totality and transitivity for this GREEDY |
956 These properties of GREEDY, however, do not transfer to POSIX by |
964 ordering can be proved relatively easily by induction. |
957 Sulzmann and Lu. To start with, transitivity does not hold anymore in the |
965 |
958 ``normal'' formulation, that is: |
966 These properties of GREEDY, however, do not transfer to the POSIX |
959 |
967 ``ordering'' by Sulzmann and Lu. To start with, $v \geq_r v′$ is not |
960 \begin{property} |
968 defined inductively, but as $v = v′$ or $v >_r v′ \wedge |v| = |v′|$. This |
|
969 means that $v >_r v′$ does not imply $v \geq_r v′$. Moreover, transitivity |
|
970 does not hold in the ``usual'' formulation, for example: |
|
971 |
|
972 \begin{proposition} |
961 Suppose $v_1 : r$, $v_2 : r$ and $v_3 : r$. |
973 Suppose $v_1 : r$, $v_2 : r$ and $v_3 : r$. |
962 If $v_1 \posix_r v_2$ and $v_2 \posix_r v_3$ |
974 If $v_1 \posix_r v_2$ and $v_2 \posix_r v_3$ |
963 then $v_1 \posix_r v_3$. |
975 then $v_1 \posix_r v_3$. |
964 \end{property} |
976 \end{proposition} |
965 |
977 |
966 \noindent If formulated like this, then there are various counter examples: |
978 \noindent If formulated like this, then there are various counter examples: |
967 Suppose $r$ is $a + ((a + a)(a + \textbf{0}))$ then the $v_1$, $v_2$ and $v_3$ |
979 FOr example let $r$ be $a + ((a + a)\cdot (a + \textbf{0}))$ then the $v_1$, |
968 below are values of $r$: |
980 $v_2$ and $v_3$ below are values of $r$: |
969 |
981 |
970 \begin{center} |
982 \begin{center} |
971 \begin{tabular}{lcl} |
983 \begin{tabular}{lcl} |
972 $v_1$ & $=$ & $Left(Char\;a)$\\ |
984 $v_1$ & $=$ & $Left(Char\;a)$\\ |
973 $v_2$ & $=$ & $Right((Left(Char\;a), Right(Void)))$\\ |
985 $v_2$ & $=$ & $Right((Left(Char\;a), Right(Void)))$\\ |
983 example actually $v_3 \posix_r v_1$! |
995 example actually $v_3 \posix_r v_1$! |
984 |
996 |
985 Sulzmann and Lu ``fix'' this problem by weakening the |
997 Sulzmann and Lu ``fix'' this problem by weakening the |
986 transitivity property. They require in addition that the |
998 transitivity property. They require in addition that the |
987 underlying strings are of the same length. This excludes the |
999 underlying strings are of the same length. This excludes the |
988 counter example above and any counter-example we could find |
1000 counter example above and any counter-example we were able |
989 with our implementation. Thus the transitivity lemma in |
1001 to find (by hand and by machine). Thus the transitivity |
990 \cite{Sulzmann2014} is: |
1002 lemma in should be formulated as: |
991 |
1003 |
992 \begin{property} |
1004 \begin{property} |
993 Suppose $v_1 : r$, $v_2 : r$ and |
1005 Suppose $v_1 : r$, $v_2 : r$ and |
994 $v_3 : r$, and also $|v_1|=|v_2|=|v_3|$.\\ |
1006 $v_3 : r$, and also $|v_1|=|v_2|=|v_3|$.\\ |
995 If $v_1 \posix_r v_2$ and $v_2 \posix_r v_3$ |
1007 If $v_1 \posix_r v_2$ and $v_2 \posix_r v_3$ |
996 then $v_1 \posix_r v_3$. |
1008 then $v_1 \posix_r v_3$. |
997 \end{property} |
1009 \end{property} |
998 |
1010 |
999 \noindent While we agree with Sulzmann and Lu that this |
1011 \noindent While we agree with Sulzmann and Lu that this property probably |
1000 property probably holds, proving it seems not so |
1012 holds, proving it seems not so straightforward: although one begins with the |
1001 straightforward. Sulzmann and Lu do not give an explicit proof |
1013 assumption that the values have the same flattening, this |
|
1014 cannot be maintained as one descends into the induction. This is |
|
1015 a problem that occurs in a number of places in their proof. |
|
1016 Sulzmann and Lu do not give an explicit proof |
1002 of the transitivity property, but give a closely related |
1017 of the transitivity property, but give a closely related |
1003 property about the existence of maximal elements. They state |
1018 property about the existence of maximal elements. They state |
1004 that this can be verified by an induction on $r$. We disagree |
1019 that this can be verified by an induction on $r$. We disagree |
1005 with this as we shall show next in case of transitivity. |
1020 with this as we shall show next in case of transitivity. |
1006 |
1021 |
1007 The case where the reasoning breaks down is the sequence case, |
1022 The case where the reasoning breaks down is the sequence case, |
1008 say $r_1\,r_2$. The induction hypotheses in this case |
1023 say $r_1\cdot r_2$. The induction hypotheses in this case |
1009 are |
1024 are |
1010 |
1025 |
1011 \begin{center} |
1026 \begin{center} |
1012 \begin{tabular}{@ {}cc@ {}} |
1027 \begin{tabular}{@ {}cc@ {}} |
1013 \begin{tabular}{@ {}ll@ {}} |
1028 \begin{tabular}{@ {}ll@ {}} |
1035 |
1050 |
1036 \noindent |
1051 \noindent |
1037 We can assume that |
1052 We can assume that |
1038 |
1053 |
1039 \begin{equation} |
1054 \begin{equation} |
1040 Seq\,v_{1l}\, v_{1r}) \posix_{r_1\,r_2} Seq\,v_{2l}\, v_{2r}) |
1055 Seq\,v_{1l}\, v_{1r} \posix_{r_1\cdot r_2} Seq\,v_{2l}\, v_{2r} |
1041 \qquad\textrm{and}\qquad |
1056 \qquad\textrm{and}\qquad |
1042 Seq\,v_{2l}\, v_{2r}) \posix_{r_1\,r_2} Seq\,v_{3l}\, v_{3r}) |
1057 Seq\,v_{2l}\, v_{2r} \posix_{r_1\cdot r_2} Seq\,v_{3l}\, v_{3r} |
1043 \label{assms} |
1058 \label{assms} |
1044 \end{equation} |
1059 \end{equation} |
1045 |
1060 |
1046 \noindent |
1061 \noindent |
1047 hold, and furthermore that the values have equal length, |
1062 hold, and furthermore that the values have equal length, |
1070 v_{1l} \posix_{r_1} v_{2l} |
1085 v_{1l} \posix_{r_1} v_{2l} |
1071 \qquad\textrm{and}\qquad |
1086 \qquad\textrm{and}\qquad |
1072 v_{2l} \posix_{r_1} v_{3l} |
1087 v_{2l} \posix_{r_1} v_{3l} |
1073 \] |
1088 \] |
1074 |
1089 |
1075 \noindent and also know the corresponding typing judgements. |
1090 \noindent and also know the corresponding inhabitation judgements. |
1076 This is exactly a case where we would like to apply the |
1091 This is exactly a case where we would like to apply the |
1077 induction hypothesis IH~$r_1$. But we cannot! We still need to |
1092 induction hypothesis IH~$r_1$. But we cannot! We still need to |
1078 show that $|v_{1l}| = |v_{2l}|$ and $|v_{2l}| = |v_{3l}|$. We |
1093 show that $|v_{1l}| = |v_{2l}|$ and $|v_{2l}| = |v_{3l}|$. We |
1079 know from \eqref{lens} that the lengths of the sequence values |
1094 know from \eqref{lens} that the lengths of the sequence values |
1080 are equal, but from this we cannot infer anything about the |
1095 are equal, but from this we cannot infer anything about the |
1086 \qquad\textrm{and}\qquad |
1101 \qquad\textrm{and}\qquad |
1087 |v_{1r}| \not= |v_{2r}| |
1102 |v_{1r}| \not= |v_{2r}| |
1088 \] |
1103 \] |
1089 |
1104 |
1090 \noindent but still \eqref{lens} will hold. Now we are stuck, |
1105 \noindent but still \eqref{lens} will hold. Now we are stuck, |
1091 since the IH does not apply. |
1106 since the IH does not apply. As said, this problem where the |
1092 |
1107 induction hypothesis does not apply arises in several places in |
|
1108 the proof of Sulzmann and LU, not just when proving transitivity. |
1093 |
1109 |
1094 *} |
1110 *} |
1095 |
1111 |
1096 section {* Conclusion *} |
1112 section {* Conclusion *} |
1097 |
1113 |
1098 |
|
1099 text {* |
1114 text {* |
1100 Nipkow lexer from 2000 |
1115 |
1101 |
1116 We have implemented the POSIX value calculation algorithm introduced in |
1102 \noindent |
1117 \cite{Sulzmann2014}. Our implementation is nearly identical to the |
1103 We have also introduced a slightly restricted version of this relation |
1118 original and all modifications are harmless (like our char-clause for |
1104 where the last rule is restricted so that @{term "flat v \<noteq> []"}. |
1119 @{text inj}). We have proved this algorithm to be correct, but correct |
1105 \bigskip |
1120 according to our own specification of what POSIX values are. Our |
1106 |
1121 specification (inspired from work in \cite{Vansummeren2006}) appears to be |
1107 |
1122 much simpler than in \cite{Sulzmann2014} and our proofs are nearly always |
1108 *} |
1123 straightforward. We have attempted to formalise the original proof |
1109 |
1124 by Sulzmann and Lu \cite{Sulzmann2014}, but we believe it contains |
1110 |
1125 unfillable gaps. In the online version of \cite{Sulzmann2014}, the authors |
1111 text {* |
1126 already acknowledge some small problems, but our experience suggests |
|
1127 there are more serious problems. |
|
1128 |
|
1129 Closesly related to our work is an automata-based lexer formalised by |
|
1130 Nipkow \cite{Nipkow98}. This lexer also splits up strings into longest |
|
1131 initial substrings, but Nipkow's algorithm is not completely |
|
1132 computational. The algorithm by Sulzmann and Lu, in contrast, can be |
|
1133 implemented easily in functional languages. A bespoke lexer for the |
|
1134 Imp-Language is formalised in Coq as part of the Software Foundations book |
|
1135 \cite{Pierce2015}. The disadvantage of such bespoke lexers is that they |
|
1136 do not generalise easily to more advanced features. |
|
1137 Our formalisation is available from |
|
1138 |
|
1139 \begin{center} |
|
1140 \url{http://www.inf.kcl.ac.uk/staff/urbanc/lex} |
|
1141 \end{center} |
|
1142 |
1112 %\noindent |
1143 %\noindent |
1113 %{\bf Acknowledgements:} |
1144 %{\bf Acknowledgements:} |
1114 %We are grateful for the comments we received from anonymous |
1145 %We are grateful for the comments we received from anonymous |
1115 %referees. |
1146 %referees. |
1116 |
|
1117 \bibliographystyle{plain} |
1147 \bibliographystyle{plain} |
1118 \bibliography{root} |
1148 \bibliography{root} |
1119 |
1149 |
1120 *} |
1150 *} |
1121 |
1151 |