lexing: comparison thys/Paper/Paper.thy

equal deleted inserted replaced

-:42ffaca7c85e
+:162f112b814b
 CHAR ("_" [1000] 80) and
 ALT ("_ + _" [77,77] 78) and
 SEQ ("_ \<cdot> _" [77,77] 78) and
 STAR ("_\<^sup>\<star>" [1000] 78) and
-val.Void ("'(')" 79) and
+val.Void ("'(')" 1000) and
 val.Char ("Char _" [1000] 78) and
 val.Left ("Left _" [79] 78) and
-val.Right ("Right _" [79] 78) and
+val.Right ("Right _" [1000] 78) and
 val.Seq ("Seq _ _" [79,79] 78) and
 val.Stars ("Stars _" [79] 78) and
 L ("L'(_')" [10] 78) and
 der_syn ("_\\_" [79, 1000] 76) and
 inductively by two clauses: @{text "(i)"} the empty string being in
 the star of a language and @{text "(ii)"} if @{term "s\<^sub>1"} is in a
 language and @{term "s\<^sub>2"} in the star of this language, then also @{term
 "s\<^sub>1 @ s\<^sub>2"} is in the star of this language. It will also be convenient
 to use the following notion of a \emph{semantic derivative} (or \emph{left
-quotient}) of a language defined as:
+quotient}) of a language defined as
+@{thm (lhs) Der_def} $\dn$ @{thm (rhs) Der_def}.
-\begin{center}
-\begin{tabular}{lcl}
-@{thm (lhs) Der_def} & $\dn$ & @{thm (rhs) Der_def}\\
-\end{tabular}
-\end{center}
-\noindent
 For semantic derivatives we have the following equations (for example
 mechanically proved in \cite{Krauss2011}):
 \begin{equation}\label{SemDer}
 \begin{array}{lcl}
 We may extend this definition to give derivatives w.r.t.~strings:
 \begin{center}
 \begin{tabular}{lcl}
 @{thm (lhs) ders.simps(1)} & $\dn$ & @{thm (rhs) ders.simps(1)}\\
+\end{tabular}
+\hspace{20mm}
+\begin{tabular}{lcl}
 @{thm (lhs) ders.simps(2)} & $\dn$ & @{thm (rhs) ders.simps(2)}\\
 \end{tabular}
 \end{center}
 \noindent Given the equations in \eqref{SemDer}, it is a relatively easy
 \end{tabular}
 \end{proposition}
 \noindent With this in place it is also very routine to prove that the
 regular expression matcher defined as
+%
 \begin{center}
 @{thm match_def}
 \end{center}
 \noindent gives a positive answer if and only if @{term "s \<in> L r"}.
 POSIX-specific choices into the side-conditions of our rules. Our
 definition is inspired by the matching relation given by Vansummeren
 \cite{Vansummeren2006}. The relation we define is ternary and written as
 \mbox{@{term "s \<in> r \<rightarrow> v"}}, relating strings, regular expressions and
 values.
+%
-\begin{center}
+\begin{center}\small
 \begin{tabular}{c}
 @{thm[mode=Axiom] Posix.intros(1)}@{text "P"}@{term "ONE"} \qquad
-@{thm[mode=Axiom] Posix.intros(2)}@{text "P"}@{term "c"}\bigskip\\
+@{thm[mode=Axiom] Posix.intros(2)}@{text "P"}@{term "c"}\medskip\\
 @{thm[mode=Rule] Posix.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}@{text "P+L"}\qquad
-@{thm[mode=Rule] Posix.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}@{text "P+R"}\bigskip\\
+@{thm[mode=Rule] Posix.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}@{text "P+R"}\medskip\\
 $\mprset{flushleft}
 \inferrule
 {@{thm (prem 1) Posix.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]} \qquad
 @{thm (prem 2) Posix.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]} \\\\
 @{thm (prem 3) Posix.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}}
 {@{thm (concl) Posix.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}}$@{text "PS"}\\
-@{thm[mode=Axiom] Posix.intros(7)}@{text "P[]"}\bigskip\\
+@{thm[mode=Axiom] Posix.intros(7)}@{text "P[]"}\medskip\\
 $\mprset{flushleft}
 \inferrule
 {@{thm (prem 1) Posix.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]} \qquad
 @{thm (prem 2) Posix.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]} \qquad
 @{thm (prem 3) Posix.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]} \\\\
 \end{description}
 \end{quote}
 \noindent For @{text "(a)"} we know @{term "s\<^sub>1 \<in> der c r\<^sub>1 \<rightarrow> v\<^sub>1"} and
 @{term "s\<^sub>2 \<in> r\<^sub>2 \<rightarrow> v\<^sub>2"} as well as
+%
 \[@{term "\<not> (\<exists>s\<^sub>3 s\<^sub>4. s\<^sub>3 \<noteq> [] \<and> s\<^sub>3 @ s\<^sub>4 = s\<^sub>2 \<and> s\<^sub>1 @ s\<^sub>3 \<in> L (der c r\<^sub>1) \<and> s\<^sub>4 \<in> L r\<^sub>2)"}\]
 \noindent From the latter we can infer by Prop.~\ref{derprop}(2):
+%
 \[@{term "\<not> (\<exists>s\<^sub>3 s\<^sub>4. s\<^sub>3 \<noteq> [] \<and> s\<^sub>3 @ s\<^sub>4 = s\<^sub>2 \<and> (c # s\<^sub>1) @ s\<^sub>3 \<in> L r\<^sub>1 \<and> s\<^sub>4 \<in> L r\<^sub>2)"}\]
 \noindent We can use the induction hypothesis for @{text "r\<^sub>1"} to obtain
 @{term "(c # s\<^sub>1) \<in> r\<^sub>1 \<rightarrow> injval r\<^sub>1 c v\<^sub>1"}. Putting this all together allows us to infer
 @{term "((c # s\<^sub>1) @ s\<^sub>2) \<in> SEQ r\<^sub>1 r\<^sub>2 \<rightarrow> Seq (injval r\<^sub>1 c v\<^sub>1) v\<^sub>2"}. The case @{text "(c)"}
 For @{text "(b)"} we know @{term "s \<in> der c r\<^sub>2 \<rightarrow> v\<^sub>1"} and
 @{term "s\<^sub>1 @ s\<^sub>2 \<notin> L (SEQ (der c r\<^sub>1) r\<^sub>2)"}. From the former
 we have @{term "(c # s) \<in> r\<^sub>2 \<rightarrow> (injval r\<^sub>2 c v\<^sub>1)"} by induction hypothesis
 for @{term "r\<^sub>2"}. From the latter we can infer
+%
 \[@{term "\<not> (\<exists>s\<^sub>3 s\<^sub>4. s\<^sub>3 \<noteq> [] \<and> s\<^sub>3 @ s\<^sub>4 = c # s \<and> s\<^sub>3 \<in> L r\<^sub>1 \<and> s\<^sub>4 \<in> L r\<^sub>2)"}\]
 \noindent By Lem.~\ref{lemmkeps} we know @{term "[] \<in> r\<^sub>1 \<rightarrow> (mkeps r\<^sub>1)"}
 holds. Putting this all together, we can conclude with @{term "(c #
 s) \<in> SEQ r\<^sub>1 r\<^sub>2 \<rightarrow> Seq (mkeps r\<^sub>1) (injval r\<^sub>2 c v\<^sub>1)"}, as required.
 While the simplification of regular expressions according to
 rules like
 \begin{equation}\label{Simpl}
-\begin{array}{lcl}
+\begin{array}{lcllcllcllcl}
-@{term "ALT ZERO r"} & @{text "\<Rightarrow>"} & @{term r}\\
+@{term "ALT ZERO r"} & @{text "\<Rightarrow>"} & @{term r} \hspace{8mm}%\\
-@{term "ALT r ZERO"} & @{text "\<Rightarrow>"} & @{term r}\\
+@{term "ALT r ZERO"} & @{text "\<Rightarrow>"} & @{term r} \hspace{8mm}%\\
-@{term "SEQ ONE r"}  & @{text "\<Rightarrow>"} & @{term r}\\
+@{term "SEQ ONE r"}  & @{text "\<Rightarrow>"} & @{term r} \hspace{8mm}%\\
 @{term "SEQ r ONE"}  & @{text "\<Rightarrow>"} & @{term r}
 \end{array}
 \end{equation}
 \noindent is well understood, there is an obstacle with the POSIX value
 a \emph{rectification function} that ``repairs'' the incorrect value.
 The rectification functions can be (slightly clumsily) implemented  in
 Isabelle/HOL as follows using some auxiliary functions:
-\begin{center}
+\begin{center}\small
+\begin{tabular}{cc}
 \begin{tabular}{lcl}
 @{thm (lhs) F_RIGHT.simps(1)} & $\dn$ & @{text "Right (f v)"}\\
 @{thm (lhs) F_LEFT.simps(1)} & $\dn$ & @{text "Left (f v)"}\\
 @{thm (lhs) F_ALT.simps(1)} & $\dn$ & @{text "Right (f\<^sub>2 v)"}\\
 @{thm (lhs) F_ALT.simps(2)} & $\dn$ & @{text "Left (f\<^sub>1 v)"}\\
 @{thm (lhs) F_SEQ1.simps(1)} & $\dn$ & @{text "Seq (f\<^sub>1 ()) (f\<^sub>2 v)"}\\
 @{thm (lhs) F_SEQ2.simps(1)} & $\dn$ & @{text "Seq (f\<^sub>1 v) (f\<^sub>2 ())"}\\
-@{thm (lhs) F_SEQ.simps(1)} & $\dn$ & @{text "Seq (f\<^sub>1 v\<^sub>1) (f\<^sub>2 v\<^sub>2)"}\bigskip\\
+@{thm (lhs) F_SEQ.simps(1)} & $\dn$ & @{text "Seq (f\<^sub>1 v\<^sub>1) (f\<^sub>2 v\<^sub>2)"}%\bigskip\\
+\end{tabular}
+&
+\begin{tabular}{lcl}
 @{term "simp_ALT (ZERO, DUMMY) (r\<^sub>2, f\<^sub>2)"} & $\dn$ & @{term "(r\<^sub>2, F_RIGHT f\<^sub>2)"}\\
 @{term "simp_ALT (r\<^sub>1, f\<^sub>1) (ZERO, DUMMY)"} & $\dn$ & @{term "(r\<^sub>1, F_LEFT f\<^sub>1)"}\\
 @{term "simp_ALT (r\<^sub>1, f\<^sub>1) (r\<^sub>2, f\<^sub>2)"} & $\dn$ & @{term "(ALT r\<^sub>1 r\<^sub>2, F_ALT f\<^sub>1 f\<^sub>2)"}\\
 @{term "simp_SEQ (ONE, f\<^sub>1) (r\<^sub>2, f\<^sub>2)"} & $\dn$ & @{term "(r\<^sub>2, F_SEQ1 f\<^sub>1 f\<^sub>2)"}\\
 @{term "simp_SEQ (r\<^sub>1, f\<^sub>1) (ONE, f\<^sub>2)"} & $\dn$ & @{term "(r\<^sub>1, F_SEQ2 f\<^sub>1 f\<^sub>2)"}\\
 @{term "simp_SEQ (r\<^sub>1, f\<^sub>1) (r\<^sub>2, f\<^sub>2)"} & $\dn$ & @{term "(SEQ r\<^sub>1 r\<^sub>2, F_SEQ f\<^sub>1 f\<^sub>2)"}\\
+\end{tabular}
 \end{tabular}
 \end{center}
 \noindent
 The functions @{text "simp\<^bsub>Alt\<^esub>"} and @{text "simp\<^bsub>Seq\<^esub>"} encode the simplification rules
 is then recursively called with the simplified derivative, but before
 we inject the character @{term c} into the value @{term v}, we need to rectify
 @{term v} (that is construct @{term "f\<^sub>r v"}). Before we can establish the correctness
 of @{term "slexer"}, we need to show that simplification preserves the language
 and simplification preserves our POSIX relation once the value is rectified
-(recall @{const "simp"} generates a regular expression, rectification function pair):
+(recall @{const "simp"} generates a regular expression / rectification function pair):
 \begin{lemma}\mbox{}\smallskip\\\label{slexeraux}
 \begin{tabular}{ll}
 (1) & @{thm L_fst_simp[symmetric]}\\
 (2) & @{thm[mode=IfThen] Posix_simp}
 \end{tabular}
 \end{lemma}
-\begin{proof}
+\begin{proof} Both are by induction on @{text r}. There is no
-Both are by induction on @{text r}. There is no interesting case for the
+interesting case for the first statement. For the second statement,
-first statement. For the second statement of interest are the @{term "r = SEQ r\<^sub>1 r\<^sub>2"}
+of interest are the @{term "r = ALT r\<^sub>1 r\<^sub>2"} and @{term "r = SEQ r\<^sub>1
-and @{term "r = ALT r\<^sub>1 r\<^sub>2"} cases.
+r\<^sub>2"} cases. In each case we have to analyse four subcases whether
+@{term "fst (simp r\<^sub>1)"} and @{term "fst (simp r\<^sub>2)"} equals @{const
+ZERO} (respectively @{const ONE}). For example for @{term "r = ALT
+r\<^sub>1 r\<^sub>2"}, considder the subcase @{term "fst (simp r\<^sub>1) = ZERO"} and
+@{term "fst (simp r\<^sub>2) \<noteq> ZERO"}. By assumption we know @{term "s \<in>
+fst (simp (ALT r\<^sub>1 r\<^sub>2)) \<rightarrow> v"}. From this we can infer @{term "s \<in> fst (simp r\<^sub>2) \<rightarrow> v"}
+and by IH also (*) @{term "s \<in> r\<^sub>2 \<rightarrow> (snd (simp r\<^sub>2) v)"}. Given @{term "fst (simp r\<^sub>1) = ZERO"}
+we know @{term "L (fst (simp r\<^sub>1)) = {}"}. By the first statement
+@{term "L r\<^sub>1"} is the empty set, meaning (**) @{term "s \<notin> L r\<^sub>1"}.
+Taking (*) and (**) together gives by the \mbox{@{text "P+R"}}-rule
+@{term "s \<in> ALT r\<^sub>1 r\<^sub>2 \<rightarrow> Right (snd (simp r\<^sub>2) v)"}. In turn this
+gives @{term "s \<in> ALT r\<^sub>1 r\<^sub>2 \<rightarrow> snd (simp (ALT r\<^sub>1 r\<^sub>2)) v"} as we need to show.
+The other cases are similar.\qed
 \end{proof}
 \noindent We can now prove relatively straightforwardly that the
 optimised lexer produce the expected result:
 final form, we make no comment thereon, preferring to give general reasons
 for our belief that the approach of \cite{Sulzmann2014} is problematic.
 Their central definition is an ``ordering relation'' defined by the
 rules (slightly adapted to fit our notation):
-\begin{center}
+\begin{center}\small
 \begin{tabular}{@ {}c@ {\hspace{4mm}}c@ {}}
 @{thm[mode=Rule] C2[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>1\<iota>" "v\<^sub>2" "r\<^sub>2" "v\<^sub>2\<iota>"]}\,(C2) &
 @{thm[mode=Rule] C1[of "v\<^sub>2" "r\<^sub>2" "v\<^sub>2\<iota>" "v\<^sub>1" "r\<^sub>1"]}\,(C1)\smallskip\\
 @{thm[mode=Rule] A1[of "v\<^sub>1" "v\<^sub>2" "r\<^sub>1" "r\<^sub>2"]}\,(A1) &

changeset 181	162f112b814b
parent 180	42ffaca7c85e
child 182	2e70c1b06ac0