108 completely formalised correctness proof of this matcher in for example HOL4 |
108 completely formalised correctness proof of this matcher in for example HOL4 |
109 has been mentioned by Owens and Slind~\cite{Owens2008}. Another one in Isabelle/HOL is part |
109 has been mentioned by Owens and Slind~\cite{Owens2008}. Another one in Isabelle/HOL is part |
110 of the work by Krauss and Nipkow \cite{Krauss2011}. And another one in Coq is given |
110 of the work by Krauss and Nipkow \cite{Krauss2011}. And another one in Coq is given |
111 by Coquand and Siles \cite{Coquand2012}. |
111 by Coquand and Siles \cite{Coquand2012}. |
112 |
112 |
113 One limitation of Brzozowski's matcher is that it only generates a YES/NO |
113 One limitation of Brzozowski's matcher is that it only generates a |
114 answer for whether a string is being matched by a regular expression. |
114 YES/NO answer for whether a string is being matched by a regular |
115 Sulzmann and Lu \cite{Sulzmann2014} extended this matcher to allow |
115 expression. Sulzmann and Lu \cite{Sulzmann2014} extended this matcher |
116 generation not just of a YES/NO answer but of an actual matching, called a |
116 to allow generation not just of a YES/NO answer but of an actual |
117 [lexical] {\em value}. They give a simple algorithm to calculate a value |
117 matching, called a [lexical] {\em value}. They give a simple algorithm |
118 that appears to be the value associated with POSIX matching |
118 to calculate a value that appears to be the value associated with |
119 \cite{Kuklewicz,Vansummeren2006}. The challenge then is to specify that |
119 POSIX matching. The challenge then is to specify that value, in an |
120 value, in an algorithm-independent fashion, and to show that Sulzmann and |
120 algorithm-independent fashion, and to show that Sulzmann and Lu's |
121 Lu's derivative-based algorithm does indeed calculate a value that is |
121 derivative-based algorithm does indeed calculate a value that is |
122 correct according to the specification. |
122 correct according to the specification. |
123 |
123 |
124 The answer given by Sulzmann and Lu \cite{Sulzmann2014} is to define a |
124 The answer given by Sulzmann and Lu \cite{Sulzmann2014} is to define a |
125 relation (called an ``order relation'') on the set of values of @{term |
125 relation (called an ``order relation'') on the set of values of @{term |
126 r}, and to show that (once a string to be matched is chosen) there is |
126 r}, and to show that (once a string to be matched is chosen) there is |
303 @{thm (lhs) nullable.simps(2)} & $\dn$ & @{thm (rhs) nullable.simps(2)}\\ |
303 @{thm (lhs) nullable.simps(2)} & $\dn$ & @{thm (rhs) nullable.simps(2)}\\ |
304 @{thm (lhs) nullable.simps(3)} & $\dn$ & @{thm (rhs) nullable.simps(3)}\\ |
304 @{thm (lhs) nullable.simps(3)} & $\dn$ & @{thm (rhs) nullable.simps(3)}\\ |
305 @{thm (lhs) nullable.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) nullable.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\ |
305 @{thm (lhs) nullable.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) nullable.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\ |
306 @{thm (lhs) nullable.simps(5)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) nullable.simps(5)[of "r\<^sub>1" "r\<^sub>2"]}\\ |
306 @{thm (lhs) nullable.simps(5)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) nullable.simps(5)[of "r\<^sub>1" "r\<^sub>2"]}\\ |
307 @{thm (lhs) nullable.simps(6)} & $\dn$ & @{thm (rhs) nullable.simps(6)}\medskip\\ |
307 @{thm (lhs) nullable.simps(6)} & $\dn$ & @{thm (rhs) nullable.simps(6)}\medskip\\ |
308 \end{tabular} |
308 |
309 \end{center} |
309 %\end{tabular} |
310 |
310 %\end{center} |
311 \begin{center} |
311 |
312 \begin{tabular}{lcl} |
312 %\begin{center} |
|
313 %\begin{tabular}{lcl} |
|
314 |
313 @{thm (lhs) der.simps(1)} & $\dn$ & @{thm (rhs) der.simps(1)}\\ |
315 @{thm (lhs) der.simps(1)} & $\dn$ & @{thm (rhs) der.simps(1)}\\ |
314 @{thm (lhs) der.simps(2)} & $\dn$ & @{thm (rhs) der.simps(2)}\\ |
316 @{thm (lhs) der.simps(2)} & $\dn$ & @{thm (rhs) der.simps(2)}\\ |
315 @{thm (lhs) der.simps(3)} & $\dn$ & @{thm (rhs) der.simps(3)}\\ |
317 @{thm (lhs) der.simps(3)} & $\dn$ & @{thm (rhs) der.simps(3)}\\ |
316 @{thm (lhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]}\\ |
318 @{thm (lhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]}\\ |
317 @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]}\\ |
319 @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]}\\ |
606 functional programming language and also in Isabelle/HOL. In the remaining |
608 functional programming language and also in Isabelle/HOL. In the remaining |
607 part of this section we prove that this algorithm is correct. |
609 part of this section we prove that this algorithm is correct. |
608 |
610 |
609 The well-known idea of POSIX matching is informally defined by the longest |
611 The well-known idea of POSIX matching is informally defined by the longest |
610 match and priority rule (see Introduction); as correctly argued in \cite{Sulzmann2014}, this |
612 match and priority rule (see Introduction); as correctly argued in \cite{Sulzmann2014}, this |
611 needs formal specification. Sulzmann and Lu define a \emph{dominance} |
613 needs formal specification. Sulzmann and Lu define an ``ordering |
612 relation\footnote{Sulzmann and Lu call it an ordering relation, but |
614 relation'' between values and argue |
613 without giving evidence that it is transitive.} between values and argue |
|
614 that there is a maximum value, as given by the derivative-based algorithm. |
615 that there is a maximum value, as given by the derivative-based algorithm. |
615 In contrast, we shall introduce a simple inductive definition that |
616 In contrast, we shall introduce a simple inductive definition that |
616 specifies directly what a \emph{POSIX value} is, incorporating the |
617 specifies directly what a \emph{POSIX value} is, incorporating the |
617 POSIX-specific choices into the side-conditions of our rules. Our |
618 POSIX-specific choices into the side-conditions of our rules. Our |
618 definition is inspired by the matching relation given by Vansummeren |
619 definition is inspired by the matching relation given by Vansummeren |
854 |
855 |
855 \noindent is well understood, there is an obstacle with the POSIX value |
856 \noindent is well understood, there is an obstacle with the POSIX value |
856 calculation algorithm by Sulzmann and Lu: if we build a derivative regular |
857 calculation algorithm by Sulzmann and Lu: if we build a derivative regular |
857 expression and then simplify it, we will calculate a POSIX value for this |
858 expression and then simplify it, we will calculate a POSIX value for this |
858 simplified derivative regular expression, \emph{not} for the original (unsimplified) |
859 simplified derivative regular expression, \emph{not} for the original (unsimplified) |
859 derivative regular expression. Sulzmann and Lu overcome this obstacle by |
860 derivative regular expression. Sulzmann and Lu \cite{Sulzmann2014} overcome this obstacle by |
860 not just calculating a simplified regular expression, but also calculating |
861 not just calculating a simplified regular expression, but also calculating |
861 a \emph{rectification function} that ``repairs'' the incorrect value. |
862 a \emph{rectification function} that ``repairs'' the incorrect value. |
862 |
863 |
863 The rectification functions can be (slightly clumsily) implemented in |
864 The rectification functions can be (slightly clumsily) implemented in |
864 Isabelle/HOL as follows using some auxiliary functions: |
865 Isabelle/HOL as follows using some auxiliary functions: |
1140 that there are more serious problems. |
1141 that there are more serious problems. |
1141 |
1142 |
1142 Having proved the correctness of the POSIX lexing algorithm in |
1143 Having proved the correctness of the POSIX lexing algorithm in |
1143 \cite{Sulzmann2014}, which lessons have we learned? Well, this is a |
1144 \cite{Sulzmann2014}, which lessons have we learned? Well, this is a |
1144 perfect example for the importance of the \emph{right} definitions. We |
1145 perfect example for the importance of the \emph{right} definitions. We |
1145 have (on and off) banged our heads on doors as soon as as first versions |
1146 have (on and off) banged our heads on doors as soon as first versions |
1146 of \cite{Sulzmann2014} appeared, but have made little progress with |
1147 of \cite{Sulzmann2014} appeared, but have made little progress with |
1147 turning the relatively detailed proof sketch in \cite{Sulzmann2014} into a |
1148 turning the relatively detailed proof sketch in \cite{Sulzmann2014} into a |
1148 formalisable proof. Having seen \cite{Vansummeren2006} and adapting the |
1149 formalisable proof. Having seen \cite{Vansummeren2006} and adapted the |
1149 POSIX definition given there for the algorithm by Sulzmann and Lu made all |
1150 POSIX definition given there for the algorithm by Sulzmann and Lu made all |
1150 the difference: the proofs, as said, are nearly straightforward. The |
1151 the difference: the proofs, as said, are nearly straightforward. The |
1151 question remains whether the original proof idea of \cite{Sulzmann2014}, |
1152 question remains whether the original proof idea of \cite{Sulzmann2014}, |
1152 potentially using our result as a stepping stone, can be made to work? |
1153 potentially using our result as a stepping stone, can be made to work? |
1153 Alas, we really do not know despite considerable effort and door banging. |
1154 Alas, we really do not know despite considerable effort and door banging. |