--- a/thys/Paper/Paper.thy Mon Mar 07 03:23:28 2016 +0000
+++ b/thys/Paper/Paper.thy Mon Mar 07 18:56:41 2016 +0000
@@ -83,7 +83,7 @@
correct according to the specification.
The answer given by Sulzmann and Lu \cite{Sulzmann2014} is to define a
-relation (called an ``Order Relation'') on the set of values of @{term r},
+relation (called an ``order relation'') on the set of values of @{term r},
and to show that (once a string to be matched is chosen) there is a maximum
element and that it is computed by their derivative-based algorithm. This
proof idea is inspired by work of Frisch and Cardelli \cite{Frisch2004} on a
@@ -152,7 +152,7 @@
POSIX matching to tokenise strings, say @{text "iffoo"} and @{text "if"}.
For @{text "iffoo"} we obtain by the longest match rule a single identifier
token, not a keyword followed by an identifier. For @{text "if"} we obtain by
-rule priority a keyword token, not an identifier token---even if @{text
+the priority rule a keyword token, not an identifier token---even if @{text
"r\<^bsub>id\<^esub>"} matches also.\bigskip
\noindent {\bf Contributions:} (NOT DONE YET) We have implemented in
@@ -213,17 +213,17 @@
recursive function @{term L} with the clauses:
\begin{center}
- \begin{tabular}{rcl}
- @{thm (lhs) L.simps(1)} & $\dn$ & @{thm (rhs) L.simps(1)}\\
- @{thm (lhs) L.simps(2)} & $\dn$ & @{thm (rhs) L.simps(2)}\\
- @{thm (lhs) L.simps(3)} & $\dn$ & @{thm (rhs) L.simps(3)}\\
- @{thm (lhs) L.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) L.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\
- @{thm (lhs) L.simps(5)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) L.simps(5)[of "r\<^sub>1" "r\<^sub>2"]}\\
- @{thm (lhs) L.simps(6)} & $\dn$ & @{thm (rhs) L.simps(6)}\\
+ \begin{tabular}{l@ {\hspace{5mm}}rcl}
+ (1) & @{thm (lhs) L.simps(1)} & $\dn$ & @{thm (rhs) L.simps(1)}\\
+ (2) & @{thm (lhs) L.simps(2)} & $\dn$ & @{thm (rhs) L.simps(2)}\\
+ (3) & @{thm (lhs) L.simps(3)} & $\dn$ & @{thm (rhs) L.simps(3)}\\
+ (4) & @{thm (lhs) L.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) L.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\
+ (5) & @{thm (lhs) L.simps(5)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) L.simps(5)[of "r\<^sub>1" "r\<^sub>2"]}\\
+ (6) & @{thm (lhs) L.simps(6)} & $\dn$ & @{thm (rhs) L.simps(6)}\\
\end{tabular}
\end{center}
- \noindent In the fourth clause we use the operation @{term "DUMMY ;;
+ \noindent In clause (4) we use the operation @{term "DUMMY ;;
DUMMY"} for the concatenation of two languages (it is also list-append for
strings). We use the star-notation for regular expressions and for
languages (in the last clause above). The star for languages is defined
@@ -310,11 +310,11 @@
\noindent gives a positive answer if and only if @{term "s \<in> L r"}.
Consequently, this regular expression matching algorithm satisfies the
- usual specification. While the matcher above calculates a provably correct
- YES/NO answer for whether a regular expression matches a string, the novel
- idea of Sulzmann and Lu \cite{Sulzmann2014} is to append another phase to
- this algorithm in order to calculate a [lexical] value. We will explain
- the details next.
+ usual specification for regular expression matching. While the matcher
+ above calculates a provably correct YES/NO answer for whether a regular
+ expression matches a string or not, the novel idea of Sulzmann and Lu
+ \cite{Sulzmann2014} is to append another phase to this algorithm in order
+ to calculate a [lexical] value. We will explain the details next.
*}
@@ -338,11 +338,11 @@
@{term "Stars vs"}
\end{center}
- \noindent where we use @{term vs} standing for a list of values. (This is
+ \noindent where we use @{term vs} to stand for a list of values. (This is
similar to the approach taken by Frisch and Cardelli for GREEDY matching
- \cite{Frisch2004}, and Sulzmann and Lu \cite{Sulzmann2014} for POSIX matching).
- The string underlying a value can be calculated by the @{const flat}
- function, written @{term "flat DUMMY"} and defined as:
+ \cite{Frisch2004}, and Sulzmann and Lu \cite{Sulzmann2014} for POSIX
+ matching). The string underlying a value can be calculated by the @{const
+ flat} function, written @{term "flat DUMMY"} and defined as:
\begin{center}
\begin{tabular}{lcl}
@@ -422,14 +422,15 @@
\draw (r4) node[anchor=north west] {\;\raisebox{-8mm}{@{term "mkeps"}}};
\end{tikzpicture}
\end{center}
-\caption{The two phases of the algorithm by Sulzmann \& Lu \cite{Sulzmann2014}
+\caption{The two phases of the algorithm by Sulzmann \& Lu \cite{Sulzmann2014},
matching the string @{term "[a,b,c]"}. The first phase (the arrows from
-left to right) is \Brz's matcher building succesive derivatives. If at the
-last regular expression is @{term nullable}, then functions of the
-second phase are called: first @{term mkeps} calculates a value witnessing
+left to right) is \Brz's matcher building succesive derivatives. If the
+last regular expression is @{term nullable}, then the functions of the
+second phase are called (the top-down and right-to-left arrows): first
+@{term mkeps} calculates a value witnessing
how the empty string has been recognised by @{term "r\<^sub>4"}. After
that the function @{term inj} `injects back' the characters of the string into
-the values (the arrows from right to left).
+the values.
\label{Sulz}}
\end{figure}
@@ -450,9 +451,10 @@
alternative regular expression, say @{term "ALT r\<^sub>1 r\<^sub>2"}, can
match the empty string and furthermore @{term "r\<^sub>1"} can match the
empty string, then we return a @{text Left}-value. The @{text
- Right}-value will only be returned if @{term "r\<^sub>1"} is not nullable.
+ Right}-value will only be returned if @{term "r\<^sub>1"} cannot match the empty
+ string.
- The most interesting novelty from Sulzmann and Lu \cite{Sulzmann2014} is
+ The most interesting idea from Sulzmann and Lu \cite{Sulzmann2014} is
the construction of a value for how @{term "r\<^sub>1"} can match the
string @{term "[a,b,c]"} from the value how the last derivative, @{term
"r\<^sub>4"} in Fig~\ref{Sulz}, can match the empty string. Sulzmann and
@@ -510,9 +512,9 @@
we match in the clause (5) is @{term "Left (Seq v\<^sub>1 v\<^sub>2)"},
while in (6) it is just @{term "Right v\<^sub>2"}. One more interesting
point is in the right-hand side of clause (6): since in this case the
- regular expression @{text "r\<^sub>1"} does not ``contribute'' for
- matching the string, that is only matches the empty string, we need to
- call @{const mkeps} in order to construct a value how @{term "r\<^sub>1"}
+ regular expression @{text "r\<^sub>1"} does not ``contribute'' to
+ matching the string, that means it only matches the empty string, we need to
+ call @{const mkeps} in order to construct a value for how @{term "r\<^sub>1"}
can match this empty string. A similar argument applies for why we can
expect in the left-hand side of clause (7) that the value is of the form
@{term "Seq v (Stars vs)"}---the derivative of a star is @{term "SEQ r
@@ -524,6 +526,18 @@
injval.simps(1)[of "c" "c"]} $\dn$ @{thm (rhs) injval.simps(1)[of "c"]},
but our deviation is harmless.}
+ The idea of @{term inj} to ``inject back'' a character into a value can
+ be made precise by the first part of the following lemma; the second
+ part shows that the underlying string of an @{const mkeps}-value is
+ the empty string.
+
+ \begin{lemma}\mbox{}\\\label{Prf_injval_flat}
+ \begin{tabular}{ll}
+ (1) & @{thm[mode=IfThen] Prf_injval_flat}\\
+ (2) & @{thm[mode=IfThen] mkeps_flat}
+ \end{tabular}
+ \end{lemma}
+
Having defined the @{const mkeps} and @{text inj} function we can extend
\Brz's matcher so that a [lexical] value is constructed (assuming the
regular expression matches the string). The clauses of the lexer are
@@ -599,9 +613,7 @@
@{thm[mode=IfThen] PMatch_mkeps}
\end{lemma}
- \begin{lemma}
- @{thm[mode=IfThen] Prf_injval_flat}
- \end{lemma}
+
\begin{lemma}
@{thm[mode=IfThen] PMatch2_roy_version}