equal
deleted
inserted
replaced
198 |
198 |
199 |
199 |
200 \noindent {\bf Contributions:} We have implemented in Isabelle/HOL the |
200 \noindent {\bf Contributions:} We have implemented in Isabelle/HOL the |
201 derivative-based regular expression matching algorithm of |
201 derivative-based regular expression matching algorithm of |
202 Sulzmann and Lu \cite{Sulzmann2014}. We have proved the correctness of this |
202 Sulzmann and Lu \cite{Sulzmann2014}. We have proved the correctness of this |
203 algorithm according to our specification of what a POSIX value is. Sulzmann |
203 algorithm according to our specification of what a POSIX value is (inspired |
|
204 by work of Vansummeren \cite{Vansummeren2006}). Sulzmann |
204 and Lu sketch in \cite{Sulzmann2014} an informal correctness proof: but to |
205 and Lu sketch in \cite{Sulzmann2014} an informal correctness proof: but to |
205 us it contains unfillable gaps.\footnote{An extended version of |
206 us it contains unfillable gaps.\footnote{An extended version of |
206 \cite{Sulzmann2014} is available at the website of its first author; this |
207 \cite{Sulzmann2014} is available at the website of its first author; this |
207 extended version already includes remarks in the appendix that their |
208 extended version already includes remarks in the appendix that their |
208 informal proof contains gaps, and possible fixes are not fully worked out.} |
209 informal proof contains gaps, and possible fixes are not fully worked out.} |
240 |
241 |
241 \noindent where @{const ZERO} stands for the regular expression that does |
242 \noindent where @{const ZERO} stands for the regular expression that does |
242 not match any string, @{const ONE} for the regular expression that matches |
243 not match any string, @{const ONE} for the regular expression that matches |
243 only the empty string and @{term c} for matching a character literal. The |
244 only the empty string and @{term c} for matching a character literal. The |
244 language of a regular expression is also defined as usual by the |
245 language of a regular expression is also defined as usual by the |
245 recursive function @{term L} with the clauses: |
246 recursive function @{term L} with the six clauses: |
246 |
247 |
247 \begin{center} |
248 \begin{center} |
248 \begin{tabular}{l@ {\hspace{3mm}}rcl} |
249 \begin{tabular}{l@ {\hspace{3mm}}rcl} |
249 (1) & @{thm (lhs) L.simps(1)} & $\dn$ & @{thm (rhs) L.simps(1)}\\ |
250 (1) & @{thm (lhs) L.simps(1)} & $\dn$ & @{thm (rhs) L.simps(1)}\\ |
250 (2) & @{thm (lhs) L.simps(2)} & $\dn$ & @{thm (rhs) L.simps(2)}\\ |
251 (2) & @{thm (lhs) L.simps(2)} & $\dn$ & @{thm (rhs) L.simps(2)}\\ |
490 \end{center} |
491 \end{center} |
491 |
492 |
492 \noindent Note that this function needs only to be partially defined, |
493 \noindent Note that this function needs only to be partially defined, |
493 namely only for regular expressions that are nullable. In case @{const |
494 namely only for regular expressions that are nullable. In case @{const |
494 nullable} fails, the string @{term "[a,b,c]"} cannot be matched by @{term |
495 nullable} fails, the string @{term "[a,b,c]"} cannot be matched by @{term |
495 "r\<^sub>1"} and an error is raised instead. Note also how this function |
496 "r\<^sub>1"} and the null value @{term "None"} is returned. Note also how this function |
496 makes some subtle choices leading to a POSIX value: for example if an |
497 makes some subtle choices leading to a POSIX value: for example if an |
497 alternative regular expression, say @{term "ALT r\<^sub>1 r\<^sub>2"}, can |
498 alternative regular expression, say @{term "ALT r\<^sub>1 r\<^sub>2"}, can |
498 match the empty string and furthermore @{term "r\<^sub>1"} can match the |
499 match the empty string and furthermore @{term "r\<^sub>1"} can match the |
499 empty string, then we return a @{text Left}-value. The @{text |
500 empty string, then we return a @{text Left}-value. The @{text |
500 Right}-value will only be returned if @{term "r\<^sub>1"} cannot match the empty |
501 Right}-value will only be returned if @{term "r\<^sub>1"} cannot match the empty |
772 \end{proof} |
773 \end{proof} |
773 |
774 |
774 \noindent |
775 \noindent |
775 With Lem.~\ref{Posix2} in place, it is completely routine to establish |
776 With Lem.~\ref{Posix2} in place, it is completely routine to establish |
776 that the Sulzmann and Lu lexer satisfies our specification (returning |
777 that the Sulzmann and Lu lexer satisfies our specification (returning |
777 an ``error'' iff the string is not in the language of the regular expression, |
778 the null value @{term "None"} iff the string is not in the language of the regular expression, |
778 and returning a unique POSIX value iff the string \emph{is} in the language): |
779 and returning a unique POSIX value iff the string \emph{is} in the language): |
779 |
780 |
780 \begin{theorem}\mbox{}\smallskip\\ |
781 \begin{theorem}\mbox{}\smallskip\\ |
781 \begin{tabular}{ll} |
782 \begin{tabular}{ll} |
782 (1) & @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}\\ |
783 (1) & @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}\\ |