--- a/thys/Journal/Paper.thy Fri Aug 18 14:51:29 2017 +0100
+++ b/thys/Journal/Paper.thy Fri Aug 25 15:05:20 2017 +0200
@@ -197,12 +197,13 @@
YES/NO answer for whether a string is being matched by a regular
expression. Sulzmann and Lu~\cite{Sulzmann2014} extended this matcher
to allow generation not just of a YES/NO answer but of an actual
-matching, called a [lexical] {\em value}. They give a simple algorithm
-to calculate a value that appears to be the value associated with
-POSIX matching. The challenge then is to specify that value, in an
-algorithm-independent fashion, and to show that Sulzmann and Lu's
-derivative-based algorithm does indeed calculate a value that is
-correct according to the specification.
+matching, called a [lexical] {\em value}. \marginpar{explain values;
+who introduced them} They give a simple algorithm to calculate a value
+that appears to be the value associated with POSIX matching. The
+challenge then is to specify that value, in an algorithm-independent
+fashion, and to show that Sulzmann and Lu's derivative-based algorithm
+does indeed calculate a value that is correct according to the
+specification.
The answer given by Sulzmann and Lu \cite{Sulzmann2014} is to define a
relation (called an ``order relation'') on the set of values of @{term
@@ -480,7 +481,7 @@
a regular expression correspond to the language of a regular
expression, namely
- \begin{proposition}
+ \begin{proposition}\label{inhabs}
@{thm L_flat_Prf}
\end{proposition}
@@ -935,25 +936,30 @@
elements.
Okui and Suzuki \cite{OkuiSuzuki2010,OkuiSuzukiTech} described
- another ordering of values, which they use to establish the correctness of
- their automata-based algorithm for POSIX matching. Their ordering
- resembles some aspects of the one given by Sulzmann and Lu, but
- is quite different. To begin with, Okui and Suzuki identify POSIX
- values as least elements in their ordering. A more substantial
- difference is that the ordering by Okui
- and Suzuki uses \emph{positions} in order to identify and
- compare subvalues, whereby positions are lists of natural
- numbers. This allows them to quite naturally formalise the Longest
- Match and Priority rules of the informal POSIX standard. Consider
- for example the value @{term v} of the form @{term "Stars [Seq
- (Char x) (Char y), Char z]"}, say. At position @{text "[0,1]"} of
- this value is the subvalue @{text "Char y"} and at position @{text
- "[1]"} the subvalue @{term "Char z"}. At the `root' position, or
- empty list @{term "[]"}, is the whole value @{term v}. The
- positions @{text "[0,1,0]"} and @{text "[2]"}, for example, are
- outside of @{text v}. If it exists, the subvalue of @{term v} at a
- position @{text p}, written @{term "at v p"}, can be recursively
- defined by
+ another ordering of values, which they use to establish the
+ correctness of their automata-based algorithm for POSIX matching.
+ Their ordering resembles some aspects of the one given by Sulzmann
+ and Lu, but is quite different. To begin with, Okui and Suzuki
+ identify POSIX values as least, rather than maximal, elements in
+ their ordering. A more substantial difference is that the ordering
+ by Okui and Suzuki uses \emph{positions} in order to identify and
+ compare subvalues. Positions are lists of natural numbers. This
+ allows them to quite naturally formalise the Longest Match and
+ Priority rules of the informal POSIX standard. Consider for example
+ the value @{term v}
+
+ \begin{center}
+ @{term "v == Stars [Seq (Char x) (Char y), Char z]"}
+ \end{center}
+
+ \noindent
+ At position @{text "[0,1]"} of this value is the
+ subvalue @{text "Char y"} and at position @{text "[1]"} the
+ subvalue @{term "Char z"}. At the `root' position, or empty list
+ @{term "[]"}, is the whole value @{term v}. The positions @{text
+ "[0,1,0]"} and @{text "[2]"}, for example, are outside of @{text
+ v}. If it exists, the subvalue of @{term v} at a position @{text
+ p}, written @{term "at v p"}, can be recursively defined by
\begin{center}
\begin{tabular}{r@ {\hspace{0mm}}lcl}
@@ -970,7 +976,7 @@
\end{tabular}
\end{center}
- \noindent We use Isabelle's notation @{term "vs ! n"} for the
+ \noindent In the last clause we use Isabelle's notation @{term "vs ! n"} for the
@{text n}th element in a list. The set of positions inside a value @{text v},
written @{term "Pos v"}, is given by the clauses
@@ -1117,11 +1123,12 @@
\noindent We can show that @{term "DUMMY :\<sqsubseteq>val DUMMY"} is
a partial order. Okui and Suzuki also show that it is a linear order
- for lexical values \cite{OkuiSuzuki2010}, but we have not done
- this. What we are going to show below is that for a given @{text r}
- and @{text s}, the ordering has a unique minimal element on the set
- @{term "LV r s"} , which is the POSIX value we defined in the
- previous section.
+ for lexical values \cite{OkuiSuzuki2010} of a given regular
+ expression and given string, but we have not done this. It is not
+ essential for our results. What we are going to show below is that
+ for a given @{text r} and @{text s}, the ordering has a unique
+ minimal element on the set @{term "LV r s"}, which is the POSIX value
+ we defined in the previous section.
Lemma 1
@@ -1145,28 +1152,44 @@
@{thm [mode=IfThen] Posix_PosOrd[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]}
\end{theorem}
- \begin{proof}
- By induction on our POSIX rules. The two base cases are straightforward: for example
- for @{term "v\<^sub>1 = Void"}, we have that @{term "v\<^sub>2 \<in> LV ONE []"} must also
- be of the form \mbox{@{term "v\<^sub>2 = Void"}}. Therfore we have @{term "v\<^sub>1 :\<sqsubseteq>val v\<^sub>2"}.
- The inductive cases are as follows:
+ \begin{proof} By induction on our POSIX rules. It is clear that
+ @{text "v\<^sub>1"} and @{text "v\<^sub>2"} have the same underlying
+ string.
+
+ The two base cases are straightforward: for example for @{term
+ "v\<^sub>1 = Void"}, we have that @{term "v\<^sub>2 \<in> LV ONE
+ []"} must also be of the form \mbox{@{term "v\<^sub>2 =
+ Void"}}. Therefore we have @{term "v\<^sub>1 :\<sqsubseteq>val
+ v\<^sub>2"}. The inductive cases are as follows:
- \begin{itemize}
- \item[$\bullet$] Case @{term "s \<in> (ALT r\<^sub>1 r\<^sub>2) \<rightarrow> (Left w\<^sub>1)"}:
- In this case @{term "v\<^sub>1 = Left w\<^sub>1"} and the value @{term "v\<^sub>2"} is either
- of the form @{term "Left w\<^sub>2"} or @{term "Right w\<^sub>2"}. In the latter case we
- can immediately conclude with @{term "v\<^sub>1 :\<sqsubseteq>val v\<^sub>2"} since a @{text Left}-value
- with the same underlying string @{text s} is always smaller or equal than a @{text Right}-value.
- In the former case we have @{term "w\<^sub>2 \<in> LV r\<^sub>1 s"} and can use the induction
- hypothesis to infer @{term "w\<^sub>1 :\<sqsubseteq>val w\<^sub>2"}. Because @{term "w\<^sub>1"}
- and @{term "w\<^sub>2"} have the same underlying string @{text s}, we can conclude with
- @{term "Left w\<^sub>1 :\<sqsubseteq>val Left w\<^sub>2"}.
+ \begin{itemize} \item[$\bullet$] Case @{term "s \<in> (ALT r\<^sub>1
+ r\<^sub>2) \<rightarrow> (Left w\<^sub>1)"}: In this case @{term
+ "v\<^sub>1 = Left w\<^sub>1"} and the value @{term "v\<^sub>2"} is
+ either of the form @{term "Left w\<^sub>2"} or @{term "Right
+ w\<^sub>2"}. In the latter case we can immediately conclude with
+ @{term "v\<^sub>1 :\<sqsubseteq>val v\<^sub>2"} since a @{text
+ Left}-value with the same underlying string @{text s} is always
+ smaller or equal than a @{text Right}-value. In the former case we
+ have @{term "w\<^sub>2 \<in> LV r\<^sub>1 s"} and can use the
+ induction hypothesis to infer @{term "w\<^sub>1 :\<sqsubseteq>val
+ w\<^sub>2"}. Because @{term "w\<^sub>1"} and @{term "w\<^sub>2"}
+ have the same underlying string @{text s}, we can conclude with
+ @{term "Left w\<^sub>1 :\<sqsubseteq>val Left w\<^sub>2"}.\smallskip
- \item[$\bullet$] Case @{term "s \<in> (ALT r\<^sub>1 r\<^sub>2) \<rightarrow> (Right w\<^sub>1)"}:
- Similarly for the case where
- @{term "v\<^sub>1 = Right w\<^sub>1"}.
+ \item[$\bullet$] Case @{term "s \<in> (ALT r\<^sub>1 r\<^sub>2)
+ \<rightarrow> (Right w\<^sub>1)"}: This case similar as the previous
+ case, except that we know that @{term "s \<notin> L
+ r\<^sub>1"}. This is needed when @{term "v\<^sub>2 = Left
+ w\<^sub>2"}: since \mbox{@{term "flat v\<^sub>2 = flat w\<^sub>2"}
+ @{text "= s"}} and @{term "\<Turnstile> w\<^sub>2 : r\<^sub>1"}, we
+ can derive a contradiction using Prop.~\ref{inhabs}. So also in this
+ case \mbox{@{term "v\<^sub>1 :\<sqsubseteq>val v\<^sub>2"}}.\smallskip
- \item[$\bullet$]
+ \item[$\bullet$] Case @{term "(s\<^sub>1 @ s\<^sub>2) \<in> (SEQ r\<^sub>1 r\<^sub>2)
+ \<rightarrow> (Seq w\<^sub>1 w\<^sub>2)"}: Assume @{term "v\<^sub>2 =
+ Seq (u\<^sub>1) (u\<^sub>2)"} with @{term "\<Turnstile> u\<^sub>1 : r\<^sub>1"}
+ and \mbox{@{term "\<Turnstile> u\<^sub>2 : r\<^sub>2"}}. We have
+
\end{itemize}
\end{proof}