lexing: comparison thys/Paper/Paper.thy

equal deleted inserted replaced

-:cdc0bdcfba3f
+:80fe81a28a52
 completely formalised correctness proof of this matcher in for example HOL4
 has been mentioned by Owens and Slind~\cite{Owens2008}. Another one in Isabelle/HOL is part
 of the work by Krauss and Nipkow \cite{Krauss2011}. And another one in Coq is given
 by Coquand and Siles \cite{Coquand2012}.
-One limitation of Brzozowski's matcher is that it only generates a YES/NO
+One limitation of Brzozowski's matcher is that it only generates a
-answer for whether a string is being matched by a regular expression.
+YES/NO answer for whether a string is being matched by a regular
-Sulzmann and Lu \cite{Sulzmann2014} extended this matcher to allow
+expression.  Sulzmann and Lu \cite{Sulzmann2014} extended this matcher
-generation not just of a YES/NO answer but of an actual matching, called a
+to allow generation not just of a YES/NO answer but of an actual
-[lexical] {\em value}. They give a simple algorithm to calculate a value
+matching, called a [lexical] {\em value}. They give a simple algorithm
-that appears to be the value associated with POSIX matching
+to calculate a value that appears to be the value associated with
-\cite{Kuklewicz,Vansummeren2006}. The challenge then is to specify that
+POSIX matching. The challenge then is to specify that value, in an
-value, in an algorithm-independent fashion, and to show that Sulzmann and
+algorithm-independent fashion, and to show that Sulzmann and Lu's
-Lu's derivative-based algorithm does indeed calculate a value that is
+derivative-based algorithm does indeed calculate a value that is
 correct according to the specification.
 The answer given by Sulzmann and Lu \cite{Sulzmann2014} is to define a
 relation (called an ``order relation'') on the set of values of @{term
 r}, and to show that (once a string to be matched is chosen) there is
 @{thm (lhs) nullable.simps(2)} & $\dn$ & @{thm (rhs) nullable.simps(2)}\\
 @{thm (lhs) nullable.simps(3)} & $\dn$ & @{thm (rhs) nullable.simps(3)}\\
 @{thm (lhs) nullable.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) nullable.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\
 @{thm (lhs) nullable.simps(5)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) nullable.simps(5)[of "r\<^sub>1" "r\<^sub>2"]}\\
 @{thm (lhs) nullable.simps(6)} & $\dn$ & @{thm (rhs) nullable.simps(6)}\medskip\\
-\end{tabular}
-\end{center}
+%\end{tabular}
+%\end{center}
-\begin{center}
-\begin{tabular}{lcl}
+%\begin{center}
+%\begin{tabular}{lcl}
 @{thm (lhs) der.simps(1)} & $\dn$ & @{thm (rhs) der.simps(1)}\\
 @{thm (lhs) der.simps(2)} & $\dn$ & @{thm (rhs) der.simps(2)}\\
 @{thm (lhs) der.simps(3)} & $\dn$ & @{thm (rhs) der.simps(3)}\\
 @{thm (lhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]}\\
 @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]}\\
 functional programming language and also in Isabelle/HOL. In the remaining
 part of this section we prove that this algorithm is correct.
 The well-known idea of POSIX matching is informally defined by the longest
 match and priority rule (see Introduction); as correctly argued in \cite{Sulzmann2014}, this
-needs formal specification. Sulzmann and Lu define a \emph{dominance}
+needs formal specification. Sulzmann and Lu define an ``ordering
-relation\footnote{Sulzmann and Lu call it an ordering relation, but
+relation'' between values and argue
-without giving evidence that it is transitive.} between values and argue
 that there is a maximum value, as given by the derivative-based algorithm.
 In contrast, we shall introduce a simple inductive definition that
 specifies directly what a \emph{POSIX value} is, incorporating the
 POSIX-specific choices into the side-conditions of our rules. Our
 definition is inspired by the matching relation given by Vansummeren
 \noindent is well understood, there is an obstacle with the POSIX value
 calculation algorithm by Sulzmann and Lu: if we build a derivative regular
 expression and then simplify it, we will calculate a POSIX value for this
 simplified derivative regular expression, \emph{not} for the original (unsimplified)
-derivative regular expression. Sulzmann and Lu overcome this obstacle by
+derivative regular expression. Sulzmann and Lu \cite{Sulzmann2014} overcome this obstacle by
 not just calculating a simplified regular expression, but also calculating
 a \emph{rectification function} that ``repairs'' the incorrect value.
 The rectification functions can be (slightly clumsily) implemented  in
 Isabelle/HOL as follows using some auxiliary functions:
 that there are more serious problems.
 Having proved the correctness of the POSIX lexing algorithm in
 \cite{Sulzmann2014}, which lessons have we learned? Well, this is a
 perfect example for the importance of the \emph{right} definitions. We
-have (on and off) banged our heads on doors as soon as as first versions
+have (on and off) banged our heads on doors as soon as first versions
 of \cite{Sulzmann2014} appeared, but have made little progress with
 turning the relatively detailed proof sketch in \cite{Sulzmann2014} into a
-formalisable proof. Having seen \cite{Vansummeren2006} and adapting the
+formalisable proof. Having seen \cite{Vansummeren2006} and adapted the
 POSIX definition given there for the algorithm by Sulzmann and Lu made all
 the difference: the proofs, as said, are nearly straightforward. The
 question remains whether the original proof idea of \cite{Sulzmann2014},
 potentially using our result as a stepping stone, can be made to work?
 Alas, we really do not know despite considerable effort and door banging.

changeset 173	80fe81a28a52
parent 172	cdc0bdcfba3f
child 174	4e3778f4a802