--- a/thys/Paper/Paper.thy Tue May 17 14:28:22 2016 +0100
+++ b/thys/Paper/Paper.thy Wed May 18 15:57:46 2016 +0100
@@ -97,7 +97,7 @@
derivatives is that they are neatly expressible in any functional language,
and easily definable and reasoned about in theorem provers---the definitions
just consist of inductive datatypes and simple recursive functions. A
-completely formalised correctness proof of this matcher in for example HOL4
+mechanised correctness proof of Brzozowski's matcher in for example HOL4
has been mentioned by Owens and Slind~\cite{Owens2008}. Another one in Isabelle/HOL is part
of the work by Krauss and Nipkow \cite{Krauss2011}. And another one in Coq is given
by Coquand and Siles \cite{Coquand2012}.
@@ -135,7 +135,7 @@
that can match determines the token.
\end{itemize}
-\noindent Consider for example @{text "r\<^bsub>key\<^esub>"} recognising keywords
+\noindent Consider for example a regular expression @{text "r\<^bsub>key\<^esub>"} for recognising keywords
such as @{text "if"}, @{text "then"} and so on; and @{text "r\<^bsub>id\<^esub>"}
recognising identifiers (say, a single character followed by
characters or numbers). Then we can form the regular expression
@@ -269,10 +269,15 @@
"s\<^sub>1 @ s\<^sub>2"} is in the star of this language. It will also be convenient
to use the following notion of a \emph{semantic derivative} (or \emph{left
quotient}) of a language defined as
- @{thm (lhs) Der_def} $\dn$ @{thm (rhs) Der_def}.
+ %
+ \begin{center}
+ @{thm Der_def}\;.
+ \end{center}
+
+ \noindent
For semantic derivatives we have the following equations (for example
mechanically proved in \cite{Krauss2011}):
-
+ %
\begin{equation}\label{SemDer}
\begin{array}{lcl}
@{thm (lhs) Der_null} & \dn & @{thm (rhs) Der_null}\\
@@ -322,9 +327,6 @@
\begin{center}
\begin{tabular}{lcl}
@{thm (lhs) ders.simps(1)} & $\dn$ & @{thm (rhs) ders.simps(1)}\\
- \end{tabular}
- \hspace{20mm}
- \begin{tabular}{lcl}
@{thm (lhs) ders.simps(2)} & $\dn$ & @{thm (rhs) ders.simps(2)}\\
\end{tabular}
\end{center}
@@ -405,10 +407,10 @@
\begin{tabular}{c}
\\[-8mm]
@{thm[mode=Axiom] Prf.intros(4)} \qquad
- @{thm[mode=Axiom] Prf.intros(5)[of "c"]}\\[3mm]
+ @{thm[mode=Axiom] Prf.intros(5)[of "c"]}\\[4mm]
@{thm[mode=Rule] Prf.intros(2)[of "v\<^sub>1" "r\<^sub>1" "r\<^sub>2"]} \qquad
- @{thm[mode=Rule] Prf.intros(3)[of "v\<^sub>2" "r\<^sub>1" "r\<^sub>2"]}\\[3mm]
- @{thm[mode=Rule] Prf.intros(1)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>2" "r\<^sub>2"]}\\[3mm]
+ @{thm[mode=Rule] Prf.intros(3)[of "v\<^sub>2" "r\<^sub>1" "r\<^sub>2"]}\\[4mm]
+ @{thm[mode=Rule] Prf.intros(1)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>2" "r\<^sub>2"]}\\[4mm]
@{thm[mode=Axiom] Prf.intros(6)[of "r"]} \qquad
@{thm[mode=Rule] Prf.intros(7)[of "v" "r" "vs"]}
\end{tabular}
@@ -472,7 +474,7 @@
left to right) is \Brz's matcher building successive derivatives. If the
last regular expression is @{term nullable}, then the functions of the
second phase are called (the top-down and right-to-left arrows): first
-@{term mkeps} calculates a value witnessing
+@{term mkeps} calculates a value @{term "v\<^sub>4"} witnessing
how the empty string has been recognised by @{term "r\<^sub>4"}. After
that the function @{term inj} ``injects back'' the characters of the string into
the values.
@@ -502,12 +504,12 @@
The most interesting idea from Sulzmann and Lu \cite{Sulzmann2014} is
the construction of a value for how @{term "r\<^sub>1"} can match the
string @{term "[a,b,c]"} from the value how the last derivative, @{term
- "r\<^sub>4"} in Fig~\ref{Sulz}, can match the empty string. Sulzmann and
+ "r\<^sub>4"} in Fig.~\ref{Sulz}, can match the empty string. Sulzmann and
Lu achieve this by stepwise ``injecting back'' the characters into the
values thus inverting the operation of building derivatives, but on the level
of values. The corresponding function, called @{term inj}, takes three
arguments, a regular expression, a character and a value. For example in
- the first (or right-most) @{term inj}-step in Fig~\ref{Sulz} the regular
+ the first (or right-most) @{term inj}-step in Fig.~\ref{Sulz} the regular
expression @{term "r\<^sub>3"}, the character @{term c} from the last
derivative step and @{term "v\<^sub>4"}, which is the value corresponding
to the derivative regular expression @{term "r\<^sub>4"}. The result is
@@ -538,7 +540,7 @@
might be instructive to look first at the three sequence cases (clauses
(4)--(6)). In each case we need to construct an ``injected value'' for
@{term "SEQ r\<^sub>1 r\<^sub>2"}. This must be a value of the form @{term
- "Seq DUMMY DUMMY"}. Recall the clause of the @{text derivative}-function
+ "Seq DUMMY DUMMY"}\,. Recall the clause of the @{text derivative}-function
for sequence regular expressions:
\begin{center}
@@ -674,7 +676,7 @@
"PS"}). The first two premises state that @{term "v\<^sub>1"} and @{term "v\<^sub>2"}
are the POSIX values for @{term "(s\<^sub>1, r\<^sub>1)"} and @{term "(s\<^sub>2, r\<^sub>2)"}
respectively. Consider now the third premise and note that the POSIX value
- of this rule should match the string @{term "s\<^sub>1 @ s\<^sub>2"}. According to the
+ of this rule should match the string \mbox{@{term "s\<^sub>1 @ s\<^sub>2"}}. According to the
longest match rule, we want that the @{term "s\<^sub>1"} is the longest initial
split of \mbox{@{term "s\<^sub>1 @ s\<^sub>2"}} such that @{term "s\<^sub>2"} is still recognised
by @{term "r\<^sub>2"}. Let us assume, contrary to the third premise, that there
@@ -685,7 +687,7 @@
matched by @{term "r\<^sub>2"}. In this case @{term "s\<^sub>1"} would \emph{not} be the
longest initial split of \mbox{@{term "s\<^sub>1 @ s\<^sub>2"}} and therefore @{term "Seq v\<^sub>1
v\<^sub>2"} cannot be a POSIX value for @{term "(s\<^sub>1 @ s\<^sub>2, SEQ r\<^sub>1 r\<^sub>2)"}.
- The main point is that this side-condition ensures the longest
+ The main point is that our side-condition ensures the longest
match rule is satisfied.
A similar condition is imposed on the POSIX value in the @{text
@@ -716,8 +718,10 @@
\end{lemma}
\begin{proof}
+ By induction on @{text r}. We explain two cases.
- By induction on @{text r}. Suppose @{term "r = ALT r\<^sub>1 r\<^sub>2"}. There are
+ \begin{itemize}
+ \item[$\bullet$] Case @{term "r = ALT r\<^sub>1 r\<^sub>2"}. There are
two subcases, namely @{text "(a)"} \mbox{@{term "v = Left v'"}} and @{term
"s \<in> der c r\<^sub>1 \<rightarrow> v'"}; and @{text "(b)"} @{term "v = Right v'"}, @{term
"s \<notin> L (der c r\<^sub>1)"} and @{term "s \<in> der c r\<^sub>2 \<rightarrow> v'"}. In @{text "(a)"} we
@@ -728,7 +732,7 @@
Prop.~\ref{derprop}(2) in order to infer @{term "c # s \<notin> L r\<^sub>1"} from @{term
"s \<notin> L (der c r\<^sub>1)"}.
- Suppose @{term "r = SEQ r\<^sub>1 r\<^sub>2"}. There are three subcases:
+ \item[$\bullet$] Case @{term "r = SEQ r\<^sub>1 r\<^sub>2"}. There are three subcases:
\begin{quote}
\begin{description}
@@ -768,6 +772,7 @@
c v\<^sub>1) \<noteq> []"}. This follows from @{term "(c # s\<^sub>1)
\<in> r\<^sub>1 \<rightarrow> injval r\<^sub>1 c v\<^sub>1"} (which in turn follows from @{term "s\<^sub>1 \<in> der c
r\<^sub>1 \<rightarrow> v\<^sub>1"} and the induction hypothesis).\qed
+ \end{itemize}
\end{proof}
\noindent
@@ -787,11 +792,14 @@
By induction on @{term s} using Lem.~\ref{lemmkeps} and \ref{Posix2}.\qed
\end{proof}
- \noindent This concludes our correctness proof. Note that we have not
- changed the algorithm of Sulzmann and Lu,\footnote{All deviations we
- introduced are harmless.} but introduced our own specification for what a
- correct result---a POSIX value---should be. A strong point in favour of
- Sulzmann and Lu's algorithm is that it can be extended in various ways.
+ \noindent In (2) we further know by Thm.~\ref{posixdeterm} that the
+ value returned by the lexer must be unique. This concludes our
+ correctness proof. Note that we have not changed the algorithm of
+ Sulzmann and Lu,\footnote{All deviations we introduced are
+ harmless.} but introduced our own specification for what a correct
+ result---a POSIX value---should be. A strong point in favour of
+ Sulzmann and Lu's algorithm is that it can be extended in various
+ ways.
*}
@@ -843,9 +851,7 @@
algorithms considerably, as noted in \cite{Sulzmann2014}. One of the
advantages of having a simple specification and correctness proof is that
the latter can be refined to prove the correctness of such simplification
- steps.
-
- While the simplification of regular expressions according to
+ steps. While the simplification of regular expressions according to
rules like
\begin{equation}\label{Simpl}
@@ -930,7 +936,7 @@
@{term v} (that is construct @{term "f\<^sub>r v"}). Before we can establish the correctness
of @{term "slexer"}, we need to show that simplification preserves the language
and simplification preserves our POSIX relation once the value is rectified
- (recall @{const "simp"} generates a regular expression / rectification function pair):
+ (recall @{const "simp"} generates a (regular expression, rectification function) pair):
\begin{lemma}\mbox{}\smallskip\\\label{slexeraux}
\begin{tabular}{ll}
@@ -1108,7 +1114,7 @@
Although they do not give an explicit proof of the transitivity property,
they give a closely related property about the existence of maximal
- elements. They state that this can be verified by an induction on $r$. We
+ elements. They state that this can be verified by an induction on @{term r}. We
disagree with this as we shall show next in case of transitivity. The case
where the reasoning breaks down is the sequence case, say @{term "SEQ r\<^sub>1 r\<^sub>2"}.
The induction hypotheses in this case are