thys/Paper/Paper.thy
changeset 178 2835d13be702
parent 177 e85d10b238d0
child 179 85766e408c66
equal deleted inserted replaced
177:e85d10b238d0 178:2835d13be702
   198 
   198 
   199 
   199 
   200 \noindent {\bf Contributions:} We have implemented in Isabelle/HOL the
   200 \noindent {\bf Contributions:} We have implemented in Isabelle/HOL the
   201 derivative-based regular expression matching algorithm of
   201 derivative-based regular expression matching algorithm of
   202 Sulzmann and Lu \cite{Sulzmann2014}. We have proved the correctness of this
   202 Sulzmann and Lu \cite{Sulzmann2014}. We have proved the correctness of this
   203 algorithm according to our specification of what a POSIX value is. Sulzmann
   203 algorithm according to our specification of what a POSIX value is (inspired
       
   204 by work of Vansummeren \cite{Vansummeren2006}). Sulzmann
   204 and Lu sketch in \cite{Sulzmann2014} an informal correctness proof: but to
   205 and Lu sketch in \cite{Sulzmann2014} an informal correctness proof: but to
   205 us it contains unfillable gaps.\footnote{An extended version of
   206 us it contains unfillable gaps.\footnote{An extended version of
   206 \cite{Sulzmann2014} is available at the website of its first author; this
   207 \cite{Sulzmann2014} is available at the website of its first author; this
   207 extended version already includes remarks in the appendix that their
   208 extended version already includes remarks in the appendix that their
   208 informal proof contains gaps, and possible fixes are not fully worked out.}
   209 informal proof contains gaps, and possible fixes are not fully worked out.}
   240 
   241 
   241   \noindent where @{const ZERO} stands for the regular expression that does
   242   \noindent where @{const ZERO} stands for the regular expression that does
   242   not match any string, @{const ONE} for the regular expression that matches
   243   not match any string, @{const ONE} for the regular expression that matches
   243   only the empty string and @{term c} for matching a character literal. The
   244   only the empty string and @{term c} for matching a character literal. The
   244   language of a regular expression is also defined as usual by the
   245   language of a regular expression is also defined as usual by the
   245   recursive function @{term L} with the clauses:
   246   recursive function @{term L} with the six clauses:
   246 
   247 
   247   \begin{center}
   248   \begin{center}
   248   \begin{tabular}{l@ {\hspace{3mm}}rcl}
   249   \begin{tabular}{l@ {\hspace{3mm}}rcl}
   249   (1) & @{thm (lhs) L.simps(1)} & $\dn$ & @{thm (rhs) L.simps(1)}\\
   250   (1) & @{thm (lhs) L.simps(1)} & $\dn$ & @{thm (rhs) L.simps(1)}\\
   250   (2) & @{thm (lhs) L.simps(2)} & $\dn$ & @{thm (rhs) L.simps(2)}\\
   251   (2) & @{thm (lhs) L.simps(2)} & $\dn$ & @{thm (rhs) L.simps(2)}\\
   490   \end{center}
   491   \end{center}
   491 
   492 
   492   \noindent Note that this function needs only to be partially defined,
   493   \noindent Note that this function needs only to be partially defined,
   493   namely only for regular expressions that are nullable. In case @{const
   494   namely only for regular expressions that are nullable. In case @{const
   494   nullable} fails, the string @{term "[a,b,c]"} cannot be matched by @{term
   495   nullable} fails, the string @{term "[a,b,c]"} cannot be matched by @{term
   495   "r\<^sub>1"} and an error is raised instead. Note also how this function
   496   "r\<^sub>1"} and the null value @{term "None"} is returned. Note also how this function
   496   makes some subtle choices leading to a POSIX value: for example if an
   497   makes some subtle choices leading to a POSIX value: for example if an
   497   alternative regular expression, say @{term "ALT r\<^sub>1 r\<^sub>2"}, can
   498   alternative regular expression, say @{term "ALT r\<^sub>1 r\<^sub>2"}, can
   498   match the empty string and furthermore @{term "r\<^sub>1"} can match the
   499   match the empty string and furthermore @{term "r\<^sub>1"} can match the
   499   empty string, then we return a @{text Left}-value. The @{text
   500   empty string, then we return a @{text Left}-value. The @{text
   500   Right}-value will only be returned if @{term "r\<^sub>1"} cannot match the empty
   501   Right}-value will only be returned if @{term "r\<^sub>1"} cannot match the empty
   772   \end{proof}
   773   \end{proof}
   773 
   774 
   774   \noindent
   775   \noindent
   775   With Lem.~\ref{Posix2} in place, it is completely routine to establish
   776   With Lem.~\ref{Posix2} in place, it is completely routine to establish
   776   that the Sulzmann and Lu lexer satisfies our specification (returning
   777   that the Sulzmann and Lu lexer satisfies our specification (returning
   777   an ``error'' iff the string is not in the language of the regular expression,
   778   the null value @{term "None"} iff the string is not in the language of the regular expression,
   778   and returning a unique POSIX value iff the string \emph{is} in the language):
   779   and returning a unique POSIX value iff the string \emph{is} in the language):
   779 
   780 
   780   \begin{theorem}\mbox{}\smallskip\\
   781   \begin{theorem}\mbox{}\smallskip\\
   781   \begin{tabular}{ll}
   782   \begin{tabular}{ll}
   782   (1) & @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}\\
   783   (1) & @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}\\