195 |
195 |
196 One limitation of Brzozowski's matcher is that it only generates a |
196 One limitation of Brzozowski's matcher is that it only generates a |
197 YES/NO answer for whether a string is being matched by a regular |
197 YES/NO answer for whether a string is being matched by a regular |
198 expression. Sulzmann and Lu~\cite{Sulzmann2014} extended this matcher |
198 expression. Sulzmann and Lu~\cite{Sulzmann2014} extended this matcher |
199 to allow generation not just of a YES/NO answer but of an actual |
199 to allow generation not just of a YES/NO answer but of an actual |
200 matching, called a [lexical] {\em value}. They give a simple algorithm |
200 matching, called a [lexical] {\em value}. \marginpar{explain values; |
201 to calculate a value that appears to be the value associated with |
201 who introduced them} They give a simple algorithm to calculate a value |
202 POSIX matching. The challenge then is to specify that value, in an |
202 that appears to be the value associated with POSIX matching. The |
203 algorithm-independent fashion, and to show that Sulzmann and Lu's |
203 challenge then is to specify that value, in an algorithm-independent |
204 derivative-based algorithm does indeed calculate a value that is |
204 fashion, and to show that Sulzmann and Lu's derivative-based algorithm |
205 correct according to the specification. |
205 does indeed calculate a value that is correct according to the |
|
206 specification. |
206 |
207 |
207 The answer given by Sulzmann and Lu \cite{Sulzmann2014} is to define a |
208 The answer given by Sulzmann and Lu \cite{Sulzmann2014} is to define a |
208 relation (called an ``order relation'') on the set of values of @{term |
209 relation (called an ``order relation'') on the set of values of @{term |
209 r}, and to show that (once a string to be matched is chosen) there is |
210 r}, and to show that (once a string to be matched is chosen) there is |
210 a maximum element and that it is computed by their derivative-based |
211 a maximum element and that it is computed by their derivative-based |
933 yet. Unfortunately, we were not able to verify claims that their |
934 yet. Unfortunately, we were not able to verify claims that their |
934 ordering has properties such as being transitive or having maximal |
935 ordering has properties such as being transitive or having maximal |
935 elements. |
936 elements. |
936 |
937 |
937 Okui and Suzuki \cite{OkuiSuzuki2010,OkuiSuzukiTech} described |
938 Okui and Suzuki \cite{OkuiSuzuki2010,OkuiSuzukiTech} described |
938 another ordering of values, which they use to establish the correctness of |
939 another ordering of values, which they use to establish the |
939 their automata-based algorithm for POSIX matching. Their ordering |
940 correctness of their automata-based algorithm for POSIX matching. |
940 resembles some aspects of the one given by Sulzmann and Lu, but |
941 Their ordering resembles some aspects of the one given by Sulzmann |
941 is quite different. To begin with, Okui and Suzuki identify POSIX |
942 and Lu, but is quite different. To begin with, Okui and Suzuki |
942 values as least elements in their ordering. A more substantial |
943 identify POSIX values as least, rather than maximal, elements in |
943 difference is that the ordering by Okui |
944 their ordering. A more substantial difference is that the ordering |
944 and Suzuki uses \emph{positions} in order to identify and |
945 by Okui and Suzuki uses \emph{positions} in order to identify and |
945 compare subvalues, whereby positions are lists of natural |
946 compare subvalues. Positions are lists of natural numbers. This |
946 numbers. This allows them to quite naturally formalise the Longest |
947 allows them to quite naturally formalise the Longest Match and |
947 Match and Priority rules of the informal POSIX standard. Consider |
948 Priority rules of the informal POSIX standard. Consider for example |
948 for example the value @{term v} of the form @{term "Stars [Seq |
949 the value @{term v} |
949 (Char x) (Char y), Char z]"}, say. At position @{text "[0,1]"} of |
950 |
950 this value is the subvalue @{text "Char y"} and at position @{text |
951 \begin{center} |
951 "[1]"} the subvalue @{term "Char z"}. At the `root' position, or |
952 @{term "v == Stars [Seq (Char x) (Char y), Char z]"} |
952 empty list @{term "[]"}, is the whole value @{term v}. The |
953 \end{center} |
953 positions @{text "[0,1,0]"} and @{text "[2]"}, for example, are |
954 |
954 outside of @{text v}. If it exists, the subvalue of @{term v} at a |
955 \noindent |
955 position @{text p}, written @{term "at v p"}, can be recursively |
956 At position @{text "[0,1]"} of this value is the |
956 defined by |
957 subvalue @{text "Char y"} and at position @{text "[1]"} the |
|
958 subvalue @{term "Char z"}. At the `root' position, or empty list |
|
959 @{term "[]"}, is the whole value @{term v}. The positions @{text |
|
960 "[0,1,0]"} and @{text "[2]"}, for example, are outside of @{text |
|
961 v}. If it exists, the subvalue of @{term v} at a position @{text |
|
962 p}, written @{term "at v p"}, can be recursively defined by |
957 |
963 |
958 \begin{center} |
964 \begin{center} |
959 \begin{tabular}{r@ {\hspace{0mm}}lcl} |
965 \begin{tabular}{r@ {\hspace{0mm}}lcl} |
960 @{term v} & @{text "\<downharpoonleft>\<^bsub>[]\<^esub>"} & @{text "\<equiv>"}& @{thm (rhs) at.simps(1)}\\ |
966 @{term v} & @{text "\<downharpoonleft>\<^bsub>[]\<^esub>"} & @{text "\<equiv>"}& @{thm (rhs) at.simps(1)}\\ |
961 @{term "Left v"} & @{text "\<downharpoonleft>\<^bsub>0::ps\<^esub>"} & @{text "\<equiv>"}& @{thm (rhs) at.simps(2)}\\ |
967 @{term "Left v"} & @{text "\<downharpoonleft>\<^bsub>0::ps\<^esub>"} & @{text "\<equiv>"}& @{thm (rhs) at.simps(2)}\\ |
968 @{thm (rhs) at.simps(5)[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2", simplified Suc_0_fold]} \\ |
974 @{thm (rhs) at.simps(5)[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2", simplified Suc_0_fold]} \\ |
969 @{term "Stars vs"} & @{text "\<downharpoonleft>\<^bsub>n::ps\<^esub>"} & @{text "\<equiv>"}& @{thm (rhs) at.simps(6)}\\ |
975 @{term "Stars vs"} & @{text "\<downharpoonleft>\<^bsub>n::ps\<^esub>"} & @{text "\<equiv>"}& @{thm (rhs) at.simps(6)}\\ |
970 \end{tabular} |
976 \end{tabular} |
971 \end{center} |
977 \end{center} |
972 |
978 |
973 \noindent We use Isabelle's notation @{term "vs ! n"} for the |
979 \noindent In the last clause we use Isabelle's notation @{term "vs ! n"} for the |
974 @{text n}th element in a list. The set of positions inside a value @{text v}, |
980 @{text n}th element in a list. The set of positions inside a value @{text v}, |
975 written @{term "Pos v"}, is given by the clauses |
981 written @{term "Pos v"}, is given by the clauses |
976 |
982 |
977 \begin{center} |
983 \begin{center} |
978 \begin{tabular}{lcl} |
984 \begin{tabular}{lcl} |
1115 The reasoning in the other cases is similar.\qed |
1121 The reasoning in the other cases is similar.\qed |
1116 \end{proof} |
1122 \end{proof} |
1117 |
1123 |
1118 \noindent We can show that @{term "DUMMY :\<sqsubseteq>val DUMMY"} is |
1124 \noindent We can show that @{term "DUMMY :\<sqsubseteq>val DUMMY"} is |
1119 a partial order. Okui and Suzuki also show that it is a linear order |
1125 a partial order. Okui and Suzuki also show that it is a linear order |
1120 for lexical values \cite{OkuiSuzuki2010}, but we have not done |
1126 for lexical values \cite{OkuiSuzuki2010} of a given regular |
1121 this. What we are going to show below is that for a given @{text r} |
1127 expression and given string, but we have not done this. It is not |
1122 and @{text s}, the ordering has a unique minimal element on the set |
1128 essential for our results. What we are going to show below is that |
1123 @{term "LV r s"} , which is the POSIX value we defined in the |
1129 for a given @{text r} and @{text s}, the ordering has a unique |
1124 previous section. |
1130 minimal element on the set @{term "LV r s"}, which is the POSIX value |
|
1131 we defined in the previous section. |
1125 |
1132 |
1126 |
1133 |
1127 Lemma 1 |
1134 Lemma 1 |
1128 |
1135 |
1129 @{thm [mode=IfThen] PosOrd_shorterE[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]} |
1136 @{thm [mode=IfThen] PosOrd_shorterE[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]} |
1143 |
1150 |
1144 \begin{theorem} |
1151 \begin{theorem} |
1145 @{thm [mode=IfThen] Posix_PosOrd[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]} |
1152 @{thm [mode=IfThen] Posix_PosOrd[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]} |
1146 \end{theorem} |
1153 \end{theorem} |
1147 |
1154 |
1148 \begin{proof} |
1155 \begin{proof} By induction on our POSIX rules. It is clear that |
1149 By induction on our POSIX rules. The two base cases are straightforward: for example |
1156 @{text "v\<^sub>1"} and @{text "v\<^sub>2"} have the same underlying |
1150 for @{term "v\<^sub>1 = Void"}, we have that @{term "v\<^sub>2 \<in> LV ONE []"} must also |
1157 string. |
1151 be of the form \mbox{@{term "v\<^sub>2 = Void"}}. Therfore we have @{term "v\<^sub>1 :\<sqsubseteq>val v\<^sub>2"}. |
1158 |
1152 The inductive cases are as follows: |
1159 The two base cases are straightforward: for example for @{term |
1153 |
1160 "v\<^sub>1 = Void"}, we have that @{term "v\<^sub>2 \<in> LV ONE |
1154 \begin{itemize} |
1161 []"} must also be of the form \mbox{@{term "v\<^sub>2 = |
1155 \item[$\bullet$] Case @{term "s \<in> (ALT r\<^sub>1 r\<^sub>2) \<rightarrow> (Left w\<^sub>1)"}: |
1162 Void"}}. Therefore we have @{term "v\<^sub>1 :\<sqsubseteq>val |
1156 In this case @{term "v\<^sub>1 = Left w\<^sub>1"} and the value @{term "v\<^sub>2"} is either |
1163 v\<^sub>2"}. The inductive cases are as follows: |
1157 of the form @{term "Left w\<^sub>2"} or @{term "Right w\<^sub>2"}. In the latter case we |
1164 |
1158 can immediately conclude with @{term "v\<^sub>1 :\<sqsubseteq>val v\<^sub>2"} since a @{text Left}-value |
1165 \begin{itemize} \item[$\bullet$] Case @{term "s \<in> (ALT r\<^sub>1 |
1159 with the same underlying string @{text s} is always smaller or equal than a @{text Right}-value. |
1166 r\<^sub>2) \<rightarrow> (Left w\<^sub>1)"}: In this case @{term |
1160 In the former case we have @{term "w\<^sub>2 \<in> LV r\<^sub>1 s"} and can use the induction |
1167 "v\<^sub>1 = Left w\<^sub>1"} and the value @{term "v\<^sub>2"} is |
1161 hypothesis to infer @{term "w\<^sub>1 :\<sqsubseteq>val w\<^sub>2"}. Because @{term "w\<^sub>1"} |
1168 either of the form @{term "Left w\<^sub>2"} or @{term "Right |
1162 and @{term "w\<^sub>2"} have the same underlying string @{text s}, we can conclude with |
1169 w\<^sub>2"}. In the latter case we can immediately conclude with |
1163 @{term "Left w\<^sub>1 :\<sqsubseteq>val Left w\<^sub>2"}. |
1170 @{term "v\<^sub>1 :\<sqsubseteq>val v\<^sub>2"} since a @{text |
1164 |
1171 Left}-value with the same underlying string @{text s} is always |
1165 \item[$\bullet$] Case @{term "s \<in> (ALT r\<^sub>1 r\<^sub>2) \<rightarrow> (Right w\<^sub>1)"}: |
1172 smaller or equal than a @{text Right}-value. In the former case we |
1166 Similarly for the case where |
1173 have @{term "w\<^sub>2 \<in> LV r\<^sub>1 s"} and can use the |
1167 @{term "v\<^sub>1 = Right w\<^sub>1"}. |
1174 induction hypothesis to infer @{term "w\<^sub>1 :\<sqsubseteq>val |
1168 |
1175 w\<^sub>2"}. Because @{term "w\<^sub>1"} and @{term "w\<^sub>2"} |
1169 \item[$\bullet$] |
1176 have the same underlying string @{text s}, we can conclude with |
|
1177 @{term "Left w\<^sub>1 :\<sqsubseteq>val Left w\<^sub>2"}.\smallskip |
|
1178 |
|
1179 \item[$\bullet$] Case @{term "s \<in> (ALT r\<^sub>1 r\<^sub>2) |
|
1180 \<rightarrow> (Right w\<^sub>1)"}: This case similar as the previous |
|
1181 case, except that we know that @{term "s \<notin> L |
|
1182 r\<^sub>1"}. This is needed when @{term "v\<^sub>2 = Left |
|
1183 w\<^sub>2"}: since \mbox{@{term "flat v\<^sub>2 = flat w\<^sub>2"} |
|
1184 @{text "= s"}} and @{term "\<Turnstile> w\<^sub>2 : r\<^sub>1"}, we |
|
1185 can derive a contradiction using Prop.~\ref{inhabs}. So also in this |
|
1186 case \mbox{@{term "v\<^sub>1 :\<sqsubseteq>val v\<^sub>2"}}.\smallskip |
|
1187 |
|
1188 \item[$\bullet$] Case @{term "(s\<^sub>1 @ s\<^sub>2) \<in> (SEQ r\<^sub>1 r\<^sub>2) |
|
1189 \<rightarrow> (Seq w\<^sub>1 w\<^sub>2)"}: Assume @{term "v\<^sub>2 = |
|
1190 Seq (u\<^sub>1) (u\<^sub>2)"} with @{term "\<Turnstile> u\<^sub>1 : r\<^sub>1"} |
|
1191 and \mbox{@{term "\<Turnstile> u\<^sub>2 : r\<^sub>2"}}. We have |
|
1192 |
1170 \end{itemize} |
1193 \end{itemize} |
1171 \end{proof} |
1194 \end{proof} |
1172 |
1195 |
1173 Given a lexical value @{text "v\<^sub>1"}, say, in @{term "LV r s"} for which there does |
1196 Given a lexical value @{text "v\<^sub>1"}, say, in @{term "LV r s"} for which there does |
1174 not exists any smaller value in @{term "LV r s"}, then this value is our POSIX value: |
1197 not exists any smaller value in @{term "LV r s"}, then this value is our POSIX value: |