(*<*)theory Paperimports "../ReStar" "~~/src/HOL/Library/LaTeXsugar"begindeclare [[show_question_marks = false]]abbreviation "der_syn r c \<equiv> der c r"notation (latex output) If ("(\<^raw:\textrm{>if\<^raw:}> (_)/ \<^raw:\textrm{>then\<^raw:}> (_)/ \<^raw:\textrm{>else\<^raw:}> (_))" 10) and Cons ("_\<^raw:\mbox{$\,$}>::\<^raw:\mbox{$\,$}>_" [78,77] 73) and ZERO ("\<^bold>0" 80) and ONE ("\<^bold>1" 80) and CHAR ("_" [1000] 80) and ALT ("_ + _" [77,77] 78) and SEQ ("_ \<cdot> _" [77,77] 78) and STAR ("_\<^sup>\<star>" [1000] 78) and val.Char ("Char _" [1000] 78) and val.Left ("Left _" [1000] 78) and val.Right ("Right _" [1000] 78) and L ("L'(_')" [10] 78) and der_syn ("_\\_" [79, 1000] 76) and flat ("|_|" [70] 73) and Sequ ("_ @ _" [78,77] 63) and injval ("inj _ _ _" [1000,77,1000] 77) and projval ("proj _ _ _" [1000,77,1000] 77) and length ("len _" [78] 73) (* and ValOrd ("_ \<succeq>\<^bsub>_\<^esub> _" [78,77,77] 73) *)(*>*)section {* Introduction *}text {*Brzozowski \cite{Brzozowski1964} introduced the notion of the {\emderivative} @{term "der c r"} of a regular expression @{text r} w.r.t.\ acharacter~@{text c}, and showed that it gave a simple solution to theproblem of matching a string @{term s} with a regular expression @{term r}:if the derivative of @{term r} w.r.t.\ (in succession) all the characters ofthe string matches the empty string, then @{term r} matches @{term s}(and {\em vice versa}). The derivative has the property (which may beregarded as its specification) that, for every string @{term s} and regularexpression @{term r} and character @{term c}, one has @{term "cs \<in> L(r)"} ifand only if \mbox{@{term "s \<in> L(der c r)"}}. The beauty of Brzozowski'sderivatives is that they are neatly expressible in any functional language,and easily definable and reasoned about in theorem provers---the definitionsjust consist of inductive datatypes and simple recursive functions. Acompletely formalised correctness proof of this matcher in for example HOL4has been given in~\cite{Owens2008}.One limitation of Brzozowski's matcher is that it only generates a YES/NOanswer for whether a string is being matched by a regular expression.Sulzmann and Lu \cite{Sulzmann2014} extended this matcher to allowgeneration not just of a YES/NO answer but of an actual matching, called a[lexical] {\em value}. They give a simple algorithm to calculate a valuethat appears to be the value associated with POSIX lexing\cite{Kuklewicz,Vansummeren2006}. The challenge then is to specify thatvalue, in an algorithm-independent fashion, and to show that Sulzamann andLu's derivative-based algorithm does indeed calculate a value that iscorrect according to the specification.The answer given by Sulzmann and Lu \cite{Sulzmann2014} is to define arelation (called an ``Order Relation'') on the set of values of @{term r},and to show that (once a string to be matched is chosen) there is a maximumelement and that it is computed by their derivative-based algorithm. Thisproof idea is inspired by work of Frisch and Cardelli \cite{Frisch2004} on aGREEDY regular expression matching algorithm. Beginning with ourobservations that, without evidence that it is transitive, it cannot becalled an ``order relation'', and that the relation is called a ``totalorder'' despite being evidently not total\footnote{\textcolor{green}{Weshould give an argument as footnote}}, we identify problems with thisapproach (of which some of the proofs are not published in\cite{Sulzmann2014}); perhaps more importantly, we give a simple inductive(and algorithm-independent) definition of what we call being a {\em POSIXvalue} for a regular expression @{term r} and a string @{term s}; we showthat the algorithm computes such a value and that such a value is unique.Proofs are both done by hand and checked in Isabelle/HOL. Theexperience of doing our proofs has been that this mechanical checking wasabsolutely essential: this subject area has hidden snares. This was alsonoted by Kuklewitz \cite{Kuklewicz} who found that nearly all POSIX matchingimplementations are ``buggy'' \cite[Page 203]{Sulzmann2014}.If a regular expression matches a string, then in general there are morethan one way of how the string is matched. There are two commonly useddisambiguation strategies to generate a unique answer: one is called GREEDYmatching \cite{Frisch2004} and the other is POSIXmatching~\cite{Kuklewicz,Sulzmann2014}. For example consider the string@{term xy} and the regular expression \mbox{@{term "STAR (ALT (ALT x y) xy)"}}.Either the string can be matched in two `iterations' by the singleletter-regular expressions @{term x} and @{term y}, or directly in oneiteration by @{term xy}. The first case corresponds to GREEDY matching,which first matches with the left-most symbol and only matches the nextsymbol in case of a mismatch (this is greedy in the sense of preferringinstant gratification to delayed repletion). The second case is POSIXmatching, which prefers the longest match.In the context of lexing, where an input string needs to be separated into asequence of tokens, POSIX is the more natural disambiguation strategy forwhat programmers consider basic syntactic building blocks in their programs.These building blocks are often specified by some regular expressions, say @{text"r\<^bsub>key\<^esub>"} and @{text "r\<^bsub>id\<^esub>"} for recognisingkeywords and identifiers, respectively. There are two underlying rulesbehind tokenising a string in a POSIX fashion:\begin{itemize} \item[$\bullet$] \underline{The Longest Match Rule (or ``maximal munch rule''):}The longest initial substring matched by any regular expression is taken asnext token.\smallskip\item[$\bullet$] \underline{Rule Priority:}For a particular longest initial substring, the first regular expressionthat can match determines the token.\end{itemize}\noindent Consider for example @{text "r\<^bsub>key\<^esub>"} recognisingkeywords such as @{text "if"}, @{text "then"} and so on; and @{text"r\<^bsub>id\<^esub>"} recognising identifiers (a single character followedby characters or numbers). Then we can form the regular expression @{text"(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\<star>"} and use POSIXmatching to tokenise strings, say @{text "iffoo"} and @{text "if"}. In thefirst case we obtain by the longest match rule a single identifier token,not a keyword followed by identifier. In the second case we obtain by rulepriority a keyword token, not an identifier token---even if @{text"r\<^bsub>id\<^esub>"} matches also.\bigskip\noindent\textcolor{green}{Not Done Yet}\medskip\noindent{\bf Contributions:}Derivatives as calculated by Brzozowski's method are usually more complexregular expressions than the initial one; various optimisations arepossible, such as the simplifications of @{term "ALT ZERO r"}, @{term "ALT rZERO"}, @{term "SEQ ONE r"} and @{term "SEQ r ONE"} to @{term r}. One of theadvantages of having a simple specification and correctness proof is thatthe latter can be refined to allow for such optimisations and simplecorrectness proof.An extended version of \cite{Sulzmann2014} is available at the websiteof its first author; this includes some ``proofs'', claimed in\cite{Sulzmann2014} to be ``rigorous''. Since these are evidently not infinal form, we make no comment thereon, preferring to give generalreasons for our belief that the approach of \cite{Sulzmann2014} isproblematic rather than to discuss details of unpublished work.*}section {* Preliminaries *}text {* \noindent Strings in Isabelle/HOL are lists of characters with the empty string being represented by the empty list, written @{term "[]"}, and list-cons being written as @{term "DUMMY # DUMMY"}. Often we use the usual bracket notation for strings; for example a string consisting of a single character is written @{term "[c]"}. By using the type @{type char} for characters we have a supply of finitely many characters roughly corresponding to the ASCII character set. Regular expressions are defined as usual as the following inductive datatype: \begin{center} @{text "r :="} @{const "ZERO"} $\mid$ @{const "ONE"} $\mid$ @{term "CHAR c"} $\mid$ @{term "ALT r\<^sub>1 r\<^sub>2"} $\mid$ @{term "SEQ r\<^sub>1 r\<^sub>2"} $\mid$ @{term "STAR r"} \end{center} \noindent where @{const ZERO} stands for the regular expression that does not match any string and @{const ONE} for the regular expression that matches only the empty string. The language of a regular expression is again defined routinely by the recursive function @{term L} with the clauses: \begin{center} \begin{tabular}{rcl} @{thm (lhs) L.simps(1)} & $\dn$ & @{thm (rhs) L.simps(1)}\\ @{thm (lhs) L.simps(2)} & $\dn$ & @{thm (rhs) L.simps(2)}\\ @{thm (lhs) L.simps(3)} & $\dn$ & @{thm (rhs) L.simps(3)}\\ @{thm (lhs) L.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) L.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\ @{thm (lhs) L.simps(5)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) L.simps(5)[of "r\<^sub>1" "r\<^sub>2"]}\\ @{thm (lhs) L.simps(6)} & $\dn$ & @{thm (rhs) L.simps(6)}\\ \end{tabular} \end{center} \noindent In the fourth clause we use @{term "DUMMY ;; DUMMY"} for the concatenation of two languages. We use the star-notation for regular expressions and sets of strings (in the last clause). The star on sets is defined inductively as usual by two clauses for the empty string being in the star of a language and is @{term "s\<^sub>1"} is in a language and @{term "s\<^sub>2"} and in the star of this language then also @{term "s\<^sub>1 @ s\<^sub>2"} is in the star of this language. \emph{Semantic derivatives} of sets of strings are defined as \begin{center} \begin{tabular}{lcl} @{thm (lhs) Der_def} & $\dn$ & @{thm (rhs) Der_def}\\ \end{tabular} \end{center} \noindent where the second definitions lifts the notion of semantic derivatives from characters to strings. \noindent The nullable function \begin{center} \begin{tabular}{lcl} @{thm (lhs) nullable.simps(1)} & $\dn$ & @{thm (rhs) nullable.simps(1)}\\ @{thm (lhs) nullable.simps(2)} & $\dn$ & @{thm (rhs) nullable.simps(2)}\\ @{thm (lhs) nullable.simps(3)} & $\dn$ & @{thm (rhs) nullable.simps(3)}\\ @{thm (lhs) nullable.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) nullable.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\ @{thm (lhs) nullable.simps(5)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) nullable.simps(5)[of "r\<^sub>1" "r\<^sub>2"]}\\ @{thm (lhs) nullable.simps(6)} & $\dn$ & @{thm (rhs) nullable.simps(6)}\\ \end{tabular} \end{center} \noindent The derivative function for characters and strings \begin{center} \begin{tabular}{lcp{7.5cm}} @{thm (lhs) der.simps(1)} & $\dn$ & @{thm (rhs) der.simps(1)}\\ @{thm (lhs) der.simps(2)} & $\dn$ & @{thm (rhs) der.simps(2)}\\ @{thm (lhs) der.simps(3)} & $\dn$ & @{thm (rhs) der.simps(3)}\\ @{thm (lhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]}\\ @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]}\\ @{thm (lhs) der.simps(6)} & $\dn$ & @{thm (rhs) der.simps(6)} \end{tabular} \end{center} It is a relatively easy exercise to prove that \begin{center} \begin{tabular}{l} @{thm[mode=IfThen] nullable_correctness}\\ @{thm[mode=IfThen] der_correctness}\\ \end{tabular} \end{center}*}section {* POSIX Regular Expression Matching *}section {* The Argument by Sulzmmann and Lu *}section {* Conclusion *}text {* Nipkow lexer from 2000*}text {* \noindent Values \begin{center} @{text "v :="} @{const "Void"} $\mid$ @{term "val.Char c"} $\mid$ @{term "Left v"} $\mid$ @{term "Right v"} $\mid$ @{term "Seq v\<^sub>1 v\<^sub>2"} $\mid$ @{term "Stars vs"} \end{center} \noindent The language of a regular expression \begin{center} \begin{tabular}{lcl} @{thm (lhs) L.simps(1)} & $\dn$ & @{thm (rhs) L.simps(1)}\\ @{thm (lhs) L.simps(2)} & $\dn$ & @{thm (rhs) L.simps(2)}\\ @{thm (lhs) L.simps(3)} & $\dn$ & @{thm (rhs) L.simps(3)}\\ @{thm (lhs) L.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) L.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\ @{thm (lhs) L.simps(5)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) L.simps(5)[of "r\<^sub>1" "r\<^sub>2"]}\\ @{thm (lhs) L.simps(6)} & $\dn$ & @{thm (rhs) L.simps(6)}\\ \end{tabular} \end{center} \noindent The @{const flat} function for values \begin{center} \begin{tabular}{lcl} @{thm (lhs) flat.simps(1)} & $\dn$ & @{thm (rhs) flat.simps(1)}\\ @{thm (lhs) flat.simps(2)} & $\dn$ & @{thm (rhs) flat.simps(2)}\\ @{thm (lhs) flat.simps(3)} & $\dn$ & @{thm (rhs) flat.simps(3)}\\ @{thm (lhs) flat.simps(4)} & $\dn$ & @{thm (rhs) flat.simps(4)}\\ @{thm (lhs) flat.simps(5)[of "v\<^sub>1" "v\<^sub>2"]} & $\dn$ & @{thm (rhs) flat.simps(5)[of "v\<^sub>1" "v\<^sub>2"]}\\ @{thm (lhs) flat.simps(6)} & $\dn$ & @{thm (rhs) flat.simps(6)}\\ @{thm (lhs) flat.simps(7)} & $\dn$ & @{thm (rhs) flat.simps(7)}\\ \end{tabular} \end{center} \noindent The @{const mkeps} function \begin{center} \begin{tabular}{lcl} @{thm (lhs) mkeps.simps(1)} & $\dn$ & @{thm (rhs) mkeps.simps(1)}\\ @{thm (lhs) mkeps.simps(2)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) mkeps.simps(2)[of "r\<^sub>1" "r\<^sub>2"]}\\ @{thm (lhs) mkeps.simps(3)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) mkeps.simps(3)[of "r\<^sub>1" "r\<^sub>2"]}\\ @{thm (lhs) mkeps.simps(4)} & $\dn$ & @{thm (rhs) mkeps.simps(4)}\\ \end{tabular} \end{center} \noindent The @{text inj} function \begin{center} \begin{tabular}{lcl} @{thm (lhs) injval.simps(1)} & $\dn$ & @{thm (rhs) injval.simps(1)}\\ @{thm (lhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]} & $\dn$ & @{thm (rhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]}\\ @{thm (lhs) injval.simps(3)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]} & $\dn$ & @{thm (rhs) injval.simps(3)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]}\\ @{thm (lhs) injval.simps(4)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]} & $\dn$ & @{thm (rhs) injval.simps(4)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]}\\ @{thm (lhs) injval.simps(5)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]} & $\dn$ & @{thm (rhs) injval.simps(5)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]}\\ @{thm (lhs) injval.simps(6)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]} & $\dn$ & @{thm (rhs) injval.simps(6)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]}\\ @{thm (lhs) injval.simps(7)[of "r" "c" "v" "vs"]} & $\dn$ & @{thm (rhs) injval.simps(7)[of "r" "c" "v" "vs"]}\\ \end{tabular} \end{center} \noindent The inhabitation relation: \begin{center} \begin{tabular}{c} @{thm[mode=Rule] Prf.intros(1)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>2" "r\<^sub>2"]}\medskip\\ @{thm[mode=Rule] Prf.intros(2)[of "v\<^sub>1" "r\<^sub>1" "r\<^sub>2"]} \qquad @{thm[mode=Rule] Prf.intros(3)[of "v\<^sub>2" "r\<^sub>1" "r\<^sub>2"]}\medskip\\ @{thm[mode=Axiom] Prf.intros(4)} \qquad @{thm[mode=Axiom] Prf.intros(5)[of "c"]}\medskip\\ @{thm[mode=Axiom] Prf.intros(6)[of "r"]} \qquad @{thm[mode=Rule] Prf.intros(7)[of "v" "r" "vs"]}\medskip\\ \end{tabular} \end{center} \noindent We have also introduced a slightly restricted version of this relation where the last rule is restricted so that @{term "flat v \<noteq> []"}. This relation for \emph{non-problematic} is written @{term "\<Turnstile> v : r"}. \bigskip \noindent Our Posix relation @{term "s \<in> r \<rightarrow> v"} \begin{center} \begin{tabular}{c} @{thm[mode=Axiom] PMatch.intros(1)} \qquad @{thm[mode=Axiom] PMatch.intros(2)}\medskip\\ @{thm[mode=Rule] PMatch.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}\qquad @{thm[mode=Rule] PMatch.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}\medskip\\ \multicolumn{1}{p{5cm}}{@{thm[mode=Rule] PMatch.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}}\medskip\\ @{thm[mode=Rule] PMatch.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]}\medskip\\ @{thm[mode=Axiom] PMatch.intros(7)}\medskip\\ \end{tabular} \end{center} \noindent Our version of Sulzmann's ordering relation \begin{center} \begin{tabular}{c} @{thm[mode=Rule] ValOrd.intros(2)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>1'" "v\<^sub>2" "r\<^sub>2" "v\<^sub>2'"]} \qquad @{thm[mode=Rule] ValOrd.intros(1)[of "v\<^sub>2" "r\<^sub>2" "v\<^sub>2'" "v\<^sub>1" "r\<^sub>1"]}\medskip\\ @{thm[mode=Rule] ValOrd.intros(3)[of "v\<^sub>1" "v\<^sub>2" "r\<^sub>1" "r\<^sub>2"]} \qquad @{thm[mode=Rule] ValOrd.intros(4)[of "v\<^sub>2" "v\<^sub>1" "r\<^sub>1" "r\<^sub>2"]}\medskip\\ @{thm[mode=Rule] ValOrd.intros(5)[of "v\<^sub>2" "r\<^sub>2" "v\<^sub>2'" "r\<^sub>1"]} \qquad @{thm[mode=Rule] ValOrd.intros(6)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>1'" "r\<^sub>2"]} \medskip\\ @{thm[mode=Axiom] ValOrd.intros(7)}\qquad @{thm[mode=Axiom] ValOrd.intros(8)[of "c"]}\medskip\\ @{thm[mode=Rule] ValOrd.intros(9)[of "v" "vs" "r"]}\qquad @{thm[mode=Rule] ValOrd.intros(10)[of "v" "vs" "r"]}\medskip\\ @{thm[mode=Rule] ValOrd.intros(11)[of "v\<^sub>1" "r" "v\<^sub>2" "vs\<^sub>1" "vs\<^sub>2"]}\medskip\\ @{thm[mode=Rule] ValOrd.intros(12)[of "vs\<^sub>1" "r" "vs\<^sub>2" "v"]}\qquad @{thm[mode=Axiom] ValOrd.intros(13)[of "r"]}\medskip\\ \end{tabular} \end{center} \noindent A prefix of a string s \begin{center} \begin{tabular}{c} @{thm prefix_def[of "s\<^sub>1" "s\<^sub>2"]} \end{tabular} \end{center} \noindent Values and non-problematic values \begin{center} \begin{tabular}{c} @{thm Values_def}\medskip\\ \end{tabular} \end{center} \noindent The point is that for a given @{text s} and @{text r} there are only finitely many non-problematic values.*} text {* \noindent Some lemmas we have proved:\bigskip @{thm L_flat_Prf} @{thm L_flat_NPrf} @{thm[mode=IfThen] mkeps_nullable} @{thm[mode=IfThen] mkeps_flat} @{thm[mode=IfThen] Prf_injval} @{thm[mode=IfThen] Prf_injval_flat} @{thm[mode=IfThen] PMatch_mkeps} @{thm[mode=IfThen] PMatch1(2)} @{thm[mode=IfThen] PMatch1N} @{thm[mode=IfThen] PMatch_determ(1)[of "s" "r" "v\<^sub>1" "v\<^sub>2"]} \medskip \noindent This is the main theorem that lets us prove that the algorithm is correct according to @{term "s \<in> r \<rightarrow> v"}: @{thm[mode=IfThen] PMatch2} \mbox{}\bigskip \noindent {\bf Proof} The proof is by induction on the definition of @{const der}. Other inductions would go through as well. The interesting case is for @{term "SEQ r\<^sub>1 r\<^sub>2"}. First we analyse the case where @{term "nullable r\<^sub>1"}. We have by induction hypothesis \[ \begin{array}{l} (IH1)\quad @{text "\<forall>s v."} \text{\;if\;} @{term "s \<in> der c r\<^sub>1 \<rightarrow> v"} \text{\;then\;} @{term "(c # s) \<in> r\<^sub>1 \<rightarrow> injval r\<^sub>1 c v"}\\ (IH2)\quad @{text "\<forall>s v."} \text{\;if\;} @{term "s \<in> der c r\<^sub>2 \<rightarrow> v"} \text{\;then\;} @{term "(c # s) \<in> r\<^sub>2 \<rightarrow> injval r\<^sub>2 c v"} \end{array} \] \noindent and have \[ @{term "s \<in> ALT (SEQ (der c r\<^sub>1) r\<^sub>2) (der c r\<^sub>2) \<rightarrow> v"} \] \noindent There are two cases what @{term v} can be: (1) @{term "Left v'"} and (2) @{term "Right v'"}. \begin{itemize} \item[(1)] We know @{term "s \<in> SEQ (der c r\<^sub>1) r\<^sub>2 \<rightarrow> v'"} holds, from which we can infer that there are @{text "s\<^sub>1"}, @{term "s\<^sub>2"}, @{text "v\<^sub>1"}, @{term "v\<^sub>2"} with \[ @{term "s\<^sub>1 \<in> der c r\<^sub>1 \<rightarrow> v\<^sub>1"} \qquad\text{and}\qquad @{term "s\<^sub>2 \<in> r\<^sub>2 \<rightarrow> v\<^sub>2"} \] and also \[ @{term "\<not> (\<exists>s\<^sub>3 s\<^sub>4. s\<^sub>3 \<noteq> [] \<and> s\<^sub>3 @ s\<^sub>4 = s\<^sub>2 \<and> s\<^sub>1 @ s\<^sub>3 \<in> L (der c r\<^sub>1) \<and> s\<^sub>4 \<in> L r\<^sub>2)"} \] \noindent and have to prove \[ @{term "((c # s\<^sub>1) @ s\<^sub>2) \<in> SEQ r\<^sub>1 r\<^sub>2 \<rightarrow> Seq (injval r\<^sub>1 c v\<^sub>1) v\<^sub>2"} \] \noindent The two requirements @{term "(c # s\<^sub>1) \<in> r\<^sub>1 \<rightarrow> injval r\<^sub>1 c v\<^sub>1"} and @{term "s\<^sub>2 \<in> r\<^sub>2 \<rightarrow> v\<^sub>2"} can be proved by the induction hypothese (IH1) and the fact above. \noindent This leaves to prove \[ @{term "\<not> (\<exists>s\<^sub>3 s\<^sub>4. s\<^sub>3 \<noteq> [] \<and> s\<^sub>3 @ s\<^sub>4 = s\<^sub>2 \<and> (c # s\<^sub>1) @ s\<^sub>3 \<in> L r\<^sub>1 \<and> s\<^sub>4 \<in> L r\<^sub>2)"} \] \noindent which holds because @{term "(c # s\<^sub>1) @ s\<^sub>3 \<in> L r\<^sub>1 "} implies @{term "s\<^sub>1 @ s\<^sub>3 \<in> L (der c r\<^sub>1) "} \item[(2)] This case is similar. \end{itemize} \noindent The final case is that @{term " \<not> nullable r\<^sub>1"} holds. This case again similar to the cases above.*}text {* %\noindent %{\bf Acknowledgements:} %We are grateful for the comments we received from anonymous %referees. \bibliographystyle{plain} \bibliography{root} \section{Roy's Rules} \newcommand{\abs}[1]{\mid\!\! #1\!\! \mid} %%\newcommand{\mts}{\textit{``''} \newcommand{\tl}{\ \triangleleft\ } $$\inferrule[]{Void \tl \epsilon}{} \quad\quad \inferrule[]{Char\ c \tl Lit\ c}{} $$ $$\inferrule {v_1 \tl r_1} {Left\ v_1 \tl r_1 + r_2} \quad\quad \inferrule[] { v_2 \tl r_2 \\ \abs{v_2}\ \not\in\ L(r_1)} {Right\ v_2 \tl r_1 + r_2} $$ $$ \inferrule {v_1 \tl r_1\\ v_2 \tl r_2\\ s \in\ L(r_1\backslash\! \abs{v_1}) \ \land\ \abs{v_2}\!\backslash s\ \epsilon\ L(r_2) \ \Rightarrow\ s = [] } {(v_1, v_2) \tl r_1 \cdot r_2} $$ $$\inferrule { v \tl r \\ vs \tl r^* \\ \abs{v}\ \not=\ []} { (v :: vs) \tl r^* } \quad\quad \inferrule{} { [] \tl r^* } $$*}(*<*)end(*>*)