lexing: comparison thys/Journal/Paper.thy

equal deleted inserted replaced

-:6746f5e1f1f8
+:12772d537b71
 One limitation of Brzozowski's matcher is that it only generates a
 YES/NO answer for whether a string is being matched by a regular
 expression.  Sulzmann and Lu~\cite{Sulzmann2014} extended this matcher
 to allow generation not just of a YES/NO answer but of an actual
-matching, called a [lexical] {\em value}. They give a simple algorithm
+matching, called a [lexical] {\em value}. \marginpar{explain values;
-to calculate a value that appears to be the value associated with
+who introduced them} They give a simple algorithm to calculate a value
-POSIX matching.  The challenge then is to specify that value, in an
+that appears to be the value associated with POSIX matching.  The
-algorithm-independent fashion, and to show that Sulzmann and Lu's
+challenge then is to specify that value, in an algorithm-independent
-derivative-based algorithm does indeed calculate a value that is
+fashion, and to show that Sulzmann and Lu's derivative-based algorithm
-correct according to the specification.
+does indeed calculate a value that is correct according to the
+specification.
 The answer given by Sulzmann and Lu \cite{Sulzmann2014} is to define a
 relation (called an ``order relation'') on the set of values of @{term
 r}, and to show that (once a string to be matched is chosen) there is
 a maximum element and that it is computed by their derivative-based
 only value associated with the regular expression @{term ONE} is
 @{term Void}.  It is routine to establish how values ``inhabiting''
 a regular expression correspond to the language of a regular
 expression, namely
-\begin{proposition}
+\begin{proposition}\label{inhabs}
 @{thm L_flat_Prf}
 \end{proposition}
 \noindent
 Given a regular expression @{text r} and a string @{text s}, we define the
 yet. Unfortunately, we were not able to verify claims that their
 ordering has properties such as being transitive or having maximal
 elements.
 Okui and Suzuki \cite{OkuiSuzuki2010,OkuiSuzukiTech} described
-another ordering of values, which they use to establish the correctness of
+another ordering of values, which they use to establish the
-their automata-based algorithm for POSIX matching.  Their ordering
+correctness of their automata-based algorithm for POSIX matching.
-resembles some aspects of the one given by Sulzmann and Lu, but
+Their ordering resembles some aspects of the one given by Sulzmann
-is quite different. To begin with, Okui and Suzuki identify POSIX
+and Lu, but is quite different. To begin with, Okui and Suzuki
-values as least elements in their ordering. A more substantial
+identify POSIX values as least, rather than maximal, elements in
-difference is that the ordering by Okui
+their ordering. A more substantial difference is that the ordering
-and Suzuki uses \emph{positions} in order to identify and
+by Okui and Suzuki uses \emph{positions} in order to identify and
-compare subvalues, whereby positions are lists of natural
+compare subvalues. Positions are lists of natural numbers. This
-numbers. This allows them to quite naturally formalise the Longest
+allows them to quite naturally formalise the Longest Match and
-Match and Priority rules of the informal POSIX standard.  Consider
+Priority rules of the informal POSIX standard.  Consider for example
-for example the value @{term v} of the form @{term "Stars [Seq
+the value @{term v}
-(Char x) (Char y), Char z]"}, say.  At position @{text "[0,1]"} of
-this value is the subvalue @{text "Char y"} and at position @{text
+\begin{center}
-"[1]"} the subvalue @{term "Char z"}.  At the `root' position, or
+@{term "v == Stars [Seq (Char x) (Char y), Char z]"}
-empty list @{term "[]"}, is the whole value @{term v}. The
+\end{center}
-positions @{text "[0,1,0]"} and @{text "[2]"}, for example, are
-outside of @{text v}. If it exists, the subvalue of @{term v} at a
+\noindent
-position @{text p}, written @{term "at v p"}, can be recursively
+At position @{text "[0,1]"} of this value is the
-defined by
+subvalue @{text "Char y"} and at position @{text "[1]"} the
+subvalue @{term "Char z"}.  At the `root' position, or empty list
+@{term "[]"}, is the whole value @{term v}. The positions @{text
+"[0,1,0]"} and @{text "[2]"}, for example, are outside of @{text
+v}. If it exists, the subvalue of @{term v} at a position @{text
+p}, written @{term "at v p"}, can be recursively defined by
 \begin{center}
 \begin{tabular}{r@ {\hspace{0mm}}lcl}
 @{term v} &  @{text "\<downharpoonleft>\<^bsub>[]\<^esub>"} & @{text "\<equiv>"}& @{thm (rhs) at.simps(1)}\\
 @{term "Left v"} & @{text "\<downharpoonleft>\<^bsub>0::ps\<^esub>"} & @{text "\<equiv>"}& @{thm (rhs) at.simps(2)}\\
 @{thm (rhs) at.simps(5)[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2", simplified Suc_0_fold]} \\
 @{term "Stars vs"} & @{text "\<downharpoonleft>\<^bsub>n::ps\<^esub>"} & @{text "\<equiv>"}& @{thm (rhs) at.simps(6)}\\
 \end{tabular}
 \end{center}
-\noindent We use Isabelle's notation @{term "vs ! n"} for the
+\noindent In the last clause we use Isabelle's notation @{term "vs ! n"} for the
 @{text n}th element in a list.  The set of positions inside a value @{text v},
 written @{term "Pos v"}, is given by the clauses
 \begin{center}
 \begin{tabular}{lcl}
 The reasoning in the other cases is similar.\qed
 \end{proof}
 \noindent We can show that @{term "DUMMY :\<sqsubseteq>val DUMMY"} is
 a partial order.  Okui and Suzuki also show that it is a linear order
-for lexical values \cite{OkuiSuzuki2010}, but we have not done
+for lexical values \cite{OkuiSuzuki2010} of a given regular
-this. What we are going to show below is that for a given @{text r}
+expression and given string, but we have not done this. It is not
-and @{text s}, the ordering has a unique minimal element on the set
+essential for our results. What we are going to show below is that
-@{term "LV r s"} , which is the POSIX value we defined in the
+for a given @{text r} and @{text s}, the ordering has a unique
-previous section.
+minimal element on the set @{term "LV r s"}, which is the POSIX value
+we defined in the previous section.
 Lemma 1
 @{thm [mode=IfThen] PosOrd_shorterE[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]}
 \begin{theorem}
 @{thm [mode=IfThen] Posix_PosOrd[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]}
 \end{theorem}
-\begin{proof}
+\begin{proof} By induction on our POSIX rules. It is clear that
-By induction on our POSIX rules. The two base cases are straightforward: for example
+@{text "v\<^sub>1"} and @{text "v\<^sub>2"} have the same underlying
-for @{term "v\<^sub>1 = Void"}, we have that @{term "v\<^sub>2 \<in> LV ONE []"} must also
+string.
-be of the form \mbox{@{term "v\<^sub>2 = Void"}}. Therfore we have @{term "v\<^sub>1 :\<sqsubseteq>val v\<^sub>2"}.
-The inductive cases are as follows:
+The two base cases are straightforward: for example for @{term
+"v\<^sub>1 = Void"}, we have that @{term "v\<^sub>2 \<in> LV ONE
-\begin{itemize}
+[]"} must also be of the form \mbox{@{term "v\<^sub>2 =
-\item[$\bullet$] Case @{term "s \<in> (ALT r\<^sub>1 r\<^sub>2) \<rightarrow> (Left w\<^sub>1)"}:
+Void"}}. Therefore we have @{term "v\<^sub>1 :\<sqsubseteq>val
-In this case @{term "v\<^sub>1 = Left w\<^sub>1"} and the value @{term "v\<^sub>2"} is either
+v\<^sub>2"}.  The inductive cases are as follows:
-of the form @{term "Left w\<^sub>2"} or @{term "Right w\<^sub>2"}. In the latter case we
-can immediately conclude with @{term "v\<^sub>1 :\<sqsubseteq>val v\<^sub>2"} since a @{text Left}-value
+\begin{itemize} \item[$\bullet$] Case @{term "s \<in> (ALT r\<^sub>1
-with the same underlying string @{text s} is always smaller or equal than a @{text Right}-value.
+r\<^sub>2) \<rightarrow> (Left w\<^sub>1)"}: In this case @{term
-In the former case we have @{term "w\<^sub>2 \<in> LV r\<^sub>1 s"} and can use the induction
+"v\<^sub>1 = Left w\<^sub>1"} and the value @{term "v\<^sub>2"} is
-hypothesis to infer @{term "w\<^sub>1 :\<sqsubseteq>val w\<^sub>2"}. Because @{term "w\<^sub>1"}
+either of the form @{term "Left w\<^sub>2"} or @{term "Right
-and @{term "w\<^sub>2"} have the same underlying string @{text s}, we can conclude with
+w\<^sub>2"}. In the latter case we can immediately conclude with
-@{term "Left w\<^sub>1 :\<sqsubseteq>val Left w\<^sub>2"}.
+@{term "v\<^sub>1 :\<sqsubseteq>val v\<^sub>2"} since a @{text
+Left}-value with the same underlying string @{text s} is always
-\item[$\bullet$] Case @{term "s \<in> (ALT r\<^sub>1 r\<^sub>2) \<rightarrow> (Right w\<^sub>1)"}:
+smaller or equal than a @{text Right}-value.  In the former case we
-Similarly for the case where
+have @{term "w\<^sub>2 \<in> LV r\<^sub>1 s"} and can use the
-@{term "v\<^sub>1 = Right w\<^sub>1"}.
+induction hypothesis to infer @{term "w\<^sub>1 :\<sqsubseteq>val
+w\<^sub>2"}. Because @{term "w\<^sub>1"} and @{term "w\<^sub>2"}
-\item[$\bullet$]
+have the same underlying string @{text s}, we can conclude with
+@{term "Left w\<^sub>1 :\<sqsubseteq>val Left w\<^sub>2"}.\smallskip
+\item[$\bullet$] Case @{term "s \<in> (ALT r\<^sub>1 r\<^sub>2)
+\<rightarrow> (Right w\<^sub>1)"}: This case similar as the previous
+case, except that we know that @{term "s \<notin> L
+r\<^sub>1"}. This is needed when @{term "v\<^sub>2 = Left
+w\<^sub>2"}: since \mbox{@{term "flat v\<^sub>2 = flat w\<^sub>2"}
+@{text "= s"}} and @{term "\<Turnstile> w\<^sub>2 : r\<^sub>1"}, we
+can derive a contradiction using Prop.~\ref{inhabs}. So also in this
+case \mbox{@{term "v\<^sub>1 :\<sqsubseteq>val v\<^sub>2"}}.\smallskip
+\item[$\bullet$]  Case @{term "(s\<^sub>1 @ s\<^sub>2) \<in> (SEQ r\<^sub>1 r\<^sub>2)
+\<rightarrow> (Seq w\<^sub>1 w\<^sub>2)"}: Assume @{term "v\<^sub>2 =
+Seq (u\<^sub>1) (u\<^sub>2)"} with @{term "\<Turnstile> u\<^sub>1 : r\<^sub>1"}
+and \mbox{@{term "\<Turnstile> u\<^sub>2 : r\<^sub>2"}}. We have
 \end{itemize}
 \end{proof}
 Given a lexical value @{text "v\<^sub>1"}, say, in @{term "LV r s"} for which there does
 not exists any smaller value in @{term "LV r s"}, then this value is our POSIX value:

changeset 269	12772d537b71
parent 268	6746f5e1f1f8
child 270	462d893ecb3d