lexing: thys/Paper/Paper.thy@2c38f10643ae


(*<*)
theory Paper
imports "../ReStar" "~~/src/HOL/Library/LaTeXsugar"
begin

declare [[show_question_marks = false]]

abbreviation 
 "der_syn r c \<equiv> der c r"

notation (latex output)
   If  ("(\<^raw:\textrm{>if\<^raw:}> (_)/ \<^raw:\textrm{>then\<^raw:}> (_)/ \<^raw:\textrm{>else\<^raw:}> (_))" 10) and
  Cons ("_\<^raw:\mbox{$\,$}>::\<^raw:\mbox{$\,$}>_" [78,77] 73) and
  ZERO ("\<^bold>0" 80) and 
  ONE ("\<^bold>1" 80) and 
  CHAR ("_" [1000] 80) and
  ALT ("_ + _" [77,77] 78) and
  SEQ ("_ \<cdot> _" [77,77] 78) and
  STAR ("_\<^sup>\<star>" [1000] 78) and
  val.Char ("Char _" [1000] 78) and
  val.Left ("Left _" [1000] 78) and
  val.Right ("Right _" [1000] 78) and
  L ("L'(_')" [10] 78) and
  der_syn ("_\\_" [79, 1000] 76) and
  flat ("|_|" [70] 73) and
  Sequ ("_ @ _" [78,77] 63) and
  injval ("inj _ _ _" [1000,77,1000] 77) and 
  projval ("proj _ _ _" [1000,77,1000] 77) and 
  length ("len _" [78] 73) 
  (* and ValOrd ("_ \<succeq>\<^bsub>_\<^esub> _" [78,77,77] 73) *)
(*>*)

section {* Introduction *}

text {*

Brzozowski \cite{Brzozowski1964} introduced the notion of the {\em
derivative} @{term "der c r"} of a regular expression @{text r} w.r.t.\ a
character~@{text c}, and showed that it gave a simple solution to the
problem of matching a string @{term s} with a regular expression @{term r}:
if the derivative of @{term r} w.r.t.\ (in succession) all the characters of
the string matches the empty string $\mts$, then @{term r} matches @{term s}
(and {\em vice versa}). The derivative has the property (which may be
regarded as its specification) that, for every string @{term s} and regular
expression @{term r} and character @{term c}, one has @{term "cs \<in> L(r)"} if
and only if \mbox{@{term "s \<in> L(der c r)"}}. The beauty of Brzozowski's
derivatives is that they are neatly expressible in any functional language,
and easily definable and reasoned about in theorem provers---the definitions
just consist of inductive datatypes and simple recursive functions. A
completely formalised correctness proof of this matcher in for example HOL4
has been given in~\cite{Owens2008}.

One limitation of Brzozowski's matcher is that it only generates a YES/NO
answer for whether a string is being matched by a regular expression.
Sulzmann and Lu \cite{Sulzmann2014} extended this matcher to allow
generation not just of a YES/NO answer but of an actual matching, called a
[lexical] {\em value}. They give a simple algorithm to calculate a value
that appears to be the value associated with POSIX lexing
\cite{Kuklewicz,Vansummeren2006}. The challenge then is to specify that
value, in an algorithm-independent fashion, and to show that Sulzamann and
Lu's derivative-based algorithm does indeed calculate a value that is
correct according to the specification.

The answer given by Sulzmann and Lu \cite{Sulzmann2014} is to define a
relation (called an ``Order Relation'') on the set of values of @{term r},
and to show that (once a string to be matched is chosen) there is a maximum
element and that it is computed by their derivative-based algorithm. This
proof idea is inspired by work of Frisch and Cardelli \cite{Frisch2004} on a
GREEDY regular expression matching algorithm. Beginning with our
observations that, without evidence that it is transitive, it cannot be
called an ``order relation'', and that the relation is called a ``total
order'' despite being evidently not total\footnote{\textcolor{green}{We
should give an argument as footnote}}, we identify problems with this
approach (of which some of the proofs are not published in
\cite{Sulzmann2014}); perhaps more importantly, we give a simple inductive
(and algorithm-independent) definition of what we call being a {\em POSIX
value} for a regular expression @{term r} and a string @{term s}; we show
that the algorithm computes such a value and that such a value is unique.
Proofs are both done by hand and checked in {\em Isabelle/HOL}. The
experience of doing our proofs has been that this mechanical checking was
absolutely essential: this subject area has hidden snares. This was also
noted by Kuklewitz \cite{Kuklewicz} who found that nearly all POSIX matching
implementations are ``buggy'' \cite[Page 203]{Sulzmann2014}.

If a regular expression matches a string, then in general there are more
than one possibility of how the string is matched. There are two commonly
used disambiguation strategies to generate a unique answer: one is called
GREEDY matching \cite{Frisch2004} and the other is POSIX
matching~\cite{Kuklewicz,Sulzmann2014}. For example consider the string
@{term xy} and the regular expression @{term "STAR (ALT (ALT x y) xy)"}.
Either the string can be matched in two `iterations' by the single
letter-regular expressions @{term x} and @{term y}, or directly in one
iteration by @{term xy}. The first case corresponds to GREEDY matching,
which first matches with the left-most symbol and only matches the next
symbol in case of a mismatch. The second case is POSIX matching, which
prefers the longest match. 

In the context of lexing, where an input string is separated into a sequence
of tokens, POSIX is the more natural disambiguation strategy for what
programmers consider basic syntactic building blocks in their programs.
There are two underlying rules behind tokenising using POSIX matching:

\begin{itemize} 
\item[$\bullet$] \underline{The Longest Match Rule (or ``maximal munch rule''):}

The longest initial substring matched by any regular expression is taken as
next token.\smallskip

\item[$\bullet$] \underline{Rule Priority:}

For a particular longest initial substring, the first regular expression
that can match determines the token.
\end{itemize}
 
\noindent Consider for example a regular expression @{text
"r\<^bsub>key\<^esub>"} that can recognise keywords, like @{text "if"},
@{text "then"} and so on; and another regular expression @{text
"r\<^bsub>id\<^esub>"} that can recognise identifiers (such as single
characters followed by characters or numbers). Then we can form the regular
expression @{text "(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\<star>"} and
use POSIX matching to, for example, tokenise the strings @{text "iffoo"} and
@{text "if"}. In the first case we obtain by the longest match rule a
single identifier token, not a keyword and identifier. In the second case we
obtain by rule priority a keyword token, not an identifier token.
\bigskip

\noindent\textcolor{green}{Not Done Yet}


\medskip\noindent
{\bf Contributions:}

Derivatives as calculated by Brzozowski's method are usually more complex
regular expressions than the initial one; various optimisations are
possible, such as the simplifications of @{term "ALT ZERO r"}, @{term "ALT r
ZERO"}, @{term "SEQ ONE r"} and @{term "SEQ r ONE"} to @{term r}. One of the
advantages of having a simple specification and correctness proof is that
the latter can be refined to allow for such optimisations and simple
correctness proof.

An extended version of \cite{Sulzmann2014} is available at the website
of its first author; this includes some ``proofs'', claimed in
\cite{Sulzmann2014} to be ``rigorous''. Since these are evidently not in
final form, we make no comment thereon, preferring to give general
reasons for our belief that the approach of \cite{Sulzmann2014} is
problematic rather than to discuss details of unpublished work.


*}

section {* Preliminaries *}

text {* \noindent Strings in Isabelle/HOL are lists of characters with
  the empty string being represented by the empty list, written @{term
  "[]"}, and list-cons being written as @{term "DUMMY # DUMMY"}.
  Often we use the usual bracket notation for strings; for example a
  string consiting of a single character is written @{term "[c]"}.  By
  using the type @{type char} for characters we have a supply of
  finitely many characters roughly corresponding to the ASCII
  character set.  Regular expression are as usual and defined as the
  following inductive datatype:

  \begin{center}
  @{text "r :="}
  @{const "ZERO"} $\mid$
  @{const "ONE"} $\mid$
  @{term "CHAR c"} $\mid$
  @{term "ALT r\<^sub>1 r\<^sub>2"} $\mid$
  @{term "SEQ r\<^sub>1 r\<^sub>2"} $\mid$
  @{term "STAR r"} 
  \end{center}

  \noindent where @{const ZERO} stands for the regular expression that
  does not macth any string and @{const ONE} for the regular
  expression that matches only the empty string. The language of a regular expression
  is again defined as usual by the following clauses

  \begin{center}
  \begin{tabular}{rcl}
  @{thm (lhs) L.simps(1)} & $\dn$ & @{thm (rhs) L.simps(1)}\\
  @{thm (lhs) L.simps(2)} & $\dn$ & @{thm (rhs) L.simps(2)}\\
  @{thm (lhs) L.simps(3)} & $\dn$ & @{thm (rhs) L.simps(3)}\\
  @{thm (lhs) L.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) L.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\
  @{thm (lhs) L.simps(5)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) L.simps(5)[of "r\<^sub>1" "r\<^sub>2"]}\\
  @{thm (lhs) L.simps(6)} & $\dn$ & @{thm (rhs) L.simps(6)}\\
  \end{tabular}
  \end{center}
  
  \noindent We use the star-notation for regular expressions and sets of strings.
  The Kleene-star on sets is defined inductively.
  
  \emph{Semantic derivatives} of sets of strings are defined as

  \begin{center}
  \begin{tabular}{lcl}
  @{thm (lhs) Der_def} & $\dn$ & @{thm (rhs) Der_def}\\
  @{thm (lhs) Ders_def} & $\dn$ & @{thm (rhs) Ders_def}\\
  \end{tabular}
  \end{center}
  
  \noindent where the second definitions lifts the notion of semantic
  derivatives from characters to strings.  

  \noindent
  The nullable function

  \begin{center}
  \begin{tabular}{lcl}
  @{thm (lhs) nullable.simps(1)} & $\dn$ & @{thm (rhs) nullable.simps(1)}\\
  @{thm (lhs) nullable.simps(2)} & $\dn$ & @{thm (rhs) nullable.simps(2)}\\
  @{thm (lhs) nullable.simps(3)} & $\dn$ & @{thm (rhs) nullable.simps(3)}\\
  @{thm (lhs) nullable.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) nullable.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\
  @{thm (lhs) nullable.simps(5)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) nullable.simps(5)[of "r\<^sub>1" "r\<^sub>2"]}\\
  @{thm (lhs) nullable.simps(6)} & $\dn$ & @{thm (rhs) nullable.simps(6)}\\
  \end{tabular}
  \end{center}

  \noindent
  The derivative function for characters and strings

  \begin{center}
  \begin{tabular}{lcp{7.5cm}}
  @{thm (lhs) der.simps(1)} & $\dn$ & @{thm (rhs) der.simps(1)}\\
  @{thm (lhs) der.simps(2)} & $\dn$ & @{thm (rhs) der.simps(2)}\\
  @{thm (lhs) der.simps(3)} & $\dn$ & @{thm (rhs) der.simps(3)}\\
  @{thm (lhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]}\\
  @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]}\\
  @{thm (lhs) der.simps(6)} & $\dn$ & @{thm (rhs) der.simps(6)}\medskip\\

  @{thm (lhs) ders.simps(1)} & $\dn$ & @{thm (rhs) ders.simps(1)}\\
  @{thm (lhs) ders.simps(2)} & $\dn$ & @{thm (rhs) ders.simps(2)}\\
  \end{tabular}
  \end{center}

  It is a relatively easy exercise to prove that

  \begin{center}
  \begin{tabular}{l}
  @{thm[mode=IfThen] nullable_correctness}\\
  @{thm[mode=IfThen] der_correctness}\\
  \end{tabular}
  \end{center}
*}

section {* POSIX Regular Expression Matching *}

section {* The Argument by Sulzmmann and Lu *}

section {* Conclusion *}

text {*

  Nipkow lexer from 2000

*}


text {*




  \noindent
  Values

  \begin{center}
  @{text "v :="}
  @{const "Void"} $\mid$
  @{term "val.Char c"} $\mid$
  @{term "Left v"} $\mid$
  @{term "Right v"} $\mid$
  @{term "Seq v\<^sub>1 v\<^sub>2"} $\mid$ 
  @{term "Stars vs"} 
  \end{center}  

  \noindent
  The language of a regular expression

  \begin{center}
  \begin{tabular}{lcl}
  @{thm (lhs) L.simps(1)} & $\dn$ & @{thm (rhs) L.simps(1)}\\
  @{thm (lhs) L.simps(2)} & $\dn$ & @{thm (rhs) L.simps(2)}\\
  @{thm (lhs) L.simps(3)} & $\dn$ & @{thm (rhs) L.simps(3)}\\
  @{thm (lhs) L.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) L.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\
  @{thm (lhs) L.simps(5)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) L.simps(5)[of "r\<^sub>1" "r\<^sub>2"]}\\
  @{thm (lhs) L.simps(6)} & $\dn$ & @{thm (rhs) L.simps(6)}\\
  \end{tabular}
  \end{center}

 

  \noindent
  The @{const flat} function for values

  \begin{center}
  \begin{tabular}{lcl}
  @{thm (lhs) flat.simps(1)} & $\dn$ & @{thm (rhs) flat.simps(1)}\\
  @{thm (lhs) flat.simps(2)} & $\dn$ & @{thm (rhs) flat.simps(2)}\\
  @{thm (lhs) flat.simps(3)} & $\dn$ & @{thm (rhs) flat.simps(3)}\\
  @{thm (lhs) flat.simps(4)} & $\dn$ & @{thm (rhs) flat.simps(4)}\\
  @{thm (lhs) flat.simps(5)[of "v\<^sub>1" "v\<^sub>2"]} & $\dn$ & @{thm (rhs) flat.simps(5)[of "v\<^sub>1" "v\<^sub>2"]}\\
  @{thm (lhs) flat.simps(6)} & $\dn$ & @{thm (rhs) flat.simps(6)}\\
  @{thm (lhs) flat.simps(7)} & $\dn$ & @{thm (rhs) flat.simps(7)}\\
  \end{tabular}
  \end{center}

  \noindent
  The @{const mkeps} function

  \begin{center}
  \begin{tabular}{lcl}
  @{thm (lhs) mkeps.simps(1)} & $\dn$ & @{thm (rhs) mkeps.simps(1)}\\
  @{thm (lhs) mkeps.simps(2)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) mkeps.simps(2)[of "r\<^sub>1" "r\<^sub>2"]}\\
  @{thm (lhs) mkeps.simps(3)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) mkeps.simps(3)[of "r\<^sub>1" "r\<^sub>2"]}\\
  @{thm (lhs) mkeps.simps(4)} & $\dn$ & @{thm (rhs) mkeps.simps(4)}\\
  \end{tabular}
  \end{center}

  \noindent
  The @{text inj} function

  \begin{center}
  \begin{tabular}{lcl}
  @{thm (lhs) injval.simps(1)} & $\dn$ & @{thm (rhs) injval.simps(1)}\\
  @{thm (lhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]} & $\dn$ & 
      @{thm (rhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]}\\
  @{thm (lhs) injval.simps(3)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]} & $\dn$ & 
      @{thm (rhs) injval.simps(3)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]}\\
  @{thm (lhs) injval.simps(4)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]} & $\dn$ 
      & @{thm (rhs) injval.simps(4)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]}\\
  @{thm (lhs) injval.simps(5)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]} & $\dn$ 
      & @{thm (rhs) injval.simps(5)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]}\\
  @{thm (lhs) injval.simps(6)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]} & $\dn$ 
      & @{thm (rhs) injval.simps(6)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]}\\
  @{thm (lhs) injval.simps(7)[of "r" "c" "v" "vs"]} & $\dn$ 
      & @{thm (rhs) injval.simps(7)[of "r" "c" "v" "vs"]}\\
  \end{tabular}
  \end{center}

  \noindent
  The inhabitation relation:

  \begin{center}
  \begin{tabular}{c}
  @{thm[mode=Rule] Prf.intros(1)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>2" "r\<^sub>2"]}\medskip\\ 
  @{thm[mode=Rule] Prf.intros(2)[of "v\<^sub>1" "r\<^sub>1" "r\<^sub>2"]} \qquad 
  @{thm[mode=Rule] Prf.intros(3)[of "v\<^sub>2" "r\<^sub>1" "r\<^sub>2"]}\medskip\\
  @{thm[mode=Axiom] Prf.intros(4)} \qquad 
  @{thm[mode=Axiom] Prf.intros(5)[of "c"]}\medskip\\
  @{thm[mode=Axiom] Prf.intros(6)[of "r"]} \qquad 
  @{thm[mode=Rule] Prf.intros(7)[of "v" "r" "vs"]}\medskip\\
  \end{tabular}
  \end{center}

  \noindent
  We have also introduced a slightly restricted version of this relation
  where the last rule is restricted so that @{term "flat v \<noteq> []"}.
  This relation for \emph{non-problematic} is written @{term "\<Turnstile> v : r"}.
  \bigskip


  \noindent
  Our Posix relation @{term "s \<in> r \<rightarrow> v"}

  \begin{center}
  \begin{tabular}{c}
  @{thm[mode=Axiom] PMatch.intros(1)} \qquad
  @{thm[mode=Axiom] PMatch.intros(2)}\medskip\\
  @{thm[mode=Rule] PMatch.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}\qquad
  @{thm[mode=Rule] PMatch.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}\medskip\\
  \multicolumn{1}{p{5cm}}{@{thm[mode=Rule] PMatch.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}}\medskip\\
  @{thm[mode=Rule] PMatch.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]}\medskip\\
  @{thm[mode=Axiom] PMatch.intros(7)}\medskip\\
  \end{tabular}
  \end{center}

  \noindent
  Our version of Sulzmann's ordering relation

  \begin{center}
  \begin{tabular}{c}
  @{thm[mode=Rule] ValOrd.intros(2)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>1'" "v\<^sub>2" "r\<^sub>2" "v\<^sub>2'"]} \qquad
  @{thm[mode=Rule] ValOrd.intros(1)[of "v\<^sub>2" "r\<^sub>2" "v\<^sub>2'" "v\<^sub>1" "r\<^sub>1"]}\medskip\\
  @{thm[mode=Rule] ValOrd.intros(3)[of "v\<^sub>1" "v\<^sub>2" "r\<^sub>1" "r\<^sub>2"]} \qquad
  @{thm[mode=Rule] ValOrd.intros(4)[of "v\<^sub>2" "v\<^sub>1" "r\<^sub>1" "r\<^sub>2"]}\medskip\\ 
  @{thm[mode=Rule] ValOrd.intros(5)[of "v\<^sub>2" "r\<^sub>2" "v\<^sub>2'" "r\<^sub>1"]} \qquad
  @{thm[mode=Rule] ValOrd.intros(6)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>1'"  "r\<^sub>2"]} \medskip\\
  @{thm[mode=Axiom] ValOrd.intros(7)}\qquad
  @{thm[mode=Axiom] ValOrd.intros(8)[of "c"]}\medskip\\
  @{thm[mode=Rule] ValOrd.intros(9)[of "v" "vs" "r"]}\qquad
  @{thm[mode=Rule] ValOrd.intros(10)[of "v" "vs" "r"]}\medskip\\
  @{thm[mode=Rule] ValOrd.intros(11)[of "v\<^sub>1" "r" "v\<^sub>2" "vs\<^sub>1" "vs\<^sub>2"]}\medskip\\
  @{thm[mode=Rule] ValOrd.intros(12)[of "vs\<^sub>1" "r" "vs\<^sub>2" "v"]}\qquad
  @{thm[mode=Axiom] ValOrd.intros(13)[of "r"]}\medskip\\
  \end{tabular}
  \end{center}

  \noindent
  A prefix of a string s

  \begin{center}
  \begin{tabular}{c}
  @{thm prefix_def[of "s\<^sub>1" "s\<^sub>2"]}
  \end{tabular}
  \end{center}

  \noindent
  Values and non-problematic values

  \begin{center}
  \begin{tabular}{c}
  @{thm Values_def}\medskip\\
  \end{tabular}
  \end{center}


  \noindent
  The point is that for a given @{text s} and @{text r} there are only finitely many
  non-problematic values.
*} 

text {* 
  \noindent
  Some lemmas we have proved:\bigskip
  
  @{thm L_flat_Prf}

  @{thm L_flat_NPrf}

  @{thm[mode=IfThen] mkeps_nullable}

  @{thm[mode=IfThen] mkeps_flat}

  @{thm[mode=IfThen] Prf_injval}

  @{thm[mode=IfThen] Prf_injval_flat}
  
  @{thm[mode=IfThen] PMatch_mkeps}
  
  @{thm[mode=IfThen] PMatch1(2)}

  @{thm[mode=IfThen] PMatch1N}

  @{thm[mode=IfThen] PMatch_determ(1)[of "s" "r" "v\<^sub>1" "v\<^sub>2"]}

  \medskip
  \noindent
  This is the main theorem that lets us prove that the algorithm is correct according to
  @{term "s \<in> r \<rightarrow> v"}:

  @{thm[mode=IfThen] PMatch2}

  \mbox{}\bigskip
  
  \noindent {\bf Proof} The proof is by induction on the definition of
  @{const der}. Other inductions would go through as well. The
  interesting case is for @{term "SEQ r\<^sub>1 r\<^sub>2"}. First we analyse the
  case where @{term "nullable r\<^sub>1"}. We have by induction hypothesis

  \[
  \begin{array}{l}
  (IH1)\quad @{text "\<forall>s v."} \text{\;if\;} @{term "s \<in> der c r\<^sub>1 \<rightarrow> v"} 
  \text{\;then\;} @{term "(c # s) \<in> r\<^sub>1 \<rightarrow> injval r\<^sub>1 c v"}\\
  (IH2)\quad @{text "\<forall>s v."} \text{\;if\;} @{term "s \<in> der c r\<^sub>2 \<rightarrow> v"} 
  \text{\;then\;} @{term "(c # s) \<in> r\<^sub>2 \<rightarrow> injval r\<^sub>2 c v"}
  \end{array}
  \]
  
  \noindent
  and have 

  \[
  @{term "s \<in> ALT (SEQ (der c r\<^sub>1) r\<^sub>2) (der c r\<^sub>2) \<rightarrow> v"}
  \]
  
  \noindent
  There are two cases what @{term v} can be: (1) @{term "Left v'"} and (2) @{term "Right v'"}.

  \begin{itemize}
  \item[(1)] We know @{term "s \<in> SEQ (der c r\<^sub>1) r\<^sub>2 \<rightarrow> v'"} holds, from which we
  can infer that there are @{text "s\<^sub>1"}, @{term "s\<^sub>2"}, @{text "v\<^sub>1"}, @{term "v\<^sub>2"}
  with

  \[
  @{term "s\<^sub>1 \<in> der c r\<^sub>1 \<rightarrow> v\<^sub>1"} \qquad\text{and}\qquad @{term "s\<^sub>2 \<in> r\<^sub>2 \<rightarrow> v\<^sub>2"}
  \]

  and also

  \[
  @{term "\<not> (\<exists>s\<^sub>3 s\<^sub>4. s\<^sub>3 \<noteq> [] \<and> s\<^sub>3 @ s\<^sub>4 = s\<^sub>2 \<and> s\<^sub>1 @ s\<^sub>3 \<in> L (der c r\<^sub>1) \<and> s\<^sub>4 \<in> L r\<^sub>2)"}
  \]

  \noindent
  and have to prove
  
  \[
  @{term "((c # s\<^sub>1) @ s\<^sub>2) \<in> SEQ r\<^sub>1 r\<^sub>2 \<rightarrow> Seq (injval r\<^sub>1 c v\<^sub>1) v\<^sub>2"}
  \]

  \noindent
  The two requirements @{term "(c # s\<^sub>1) \<in> r\<^sub>1 \<rightarrow> injval r\<^sub>1 c v\<^sub>1"} and 
  @{term "s\<^sub>2 \<in> r\<^sub>2 \<rightarrow> v\<^sub>2"} can be proved by the induction hypothese (IH1) and the
  fact above.

  \noindent
  This leaves to prove
  
  \[
  @{term "\<not> (\<exists>s\<^sub>3 s\<^sub>4. s\<^sub>3 \<noteq> [] \<and> s\<^sub>3 @ s\<^sub>4 = s\<^sub>2 \<and> (c # s\<^sub>1) @ s\<^sub>3 \<in> L r\<^sub>1 \<and> s\<^sub>4 \<in> L r\<^sub>2)"}
  \]
  
  \noindent
  which holds because @{term "(c # s\<^sub>1) @ s\<^sub>3 \<in> L r\<^sub>1 "} implies @{term "s\<^sub>1 @ s\<^sub>3 \<in> L (der c r\<^sub>1) "}

  \item[(2)] This case is similar.
  \end{itemize}

  \noindent 
  The final case is that @{term " \<not> nullable r\<^sub>1"} holds. This case again similar
  to the cases above.
*}


text {*
  %\noindent
  %{\bf Acknowledgements:}
  %We are grateful for the comments we received from anonymous
  %referees.

  \bibliographystyle{plain}
  \bibliography{root}

  \section{Roy's Rules}

   \newcommand{\abs}[1]{\mid\!\! #1\!\! \mid}
   %%\newcommand{\mts}{\textit{``''}
   \newcommand{\tl}{\ \triangleleft\ }
   $$\inferrule[]{Void \tl \epsilon}{}
            \quad\quad
     \inferrule[]{Char\ c \tl Lit\ c}{}
   $$
   $$\inferrule
       {v_1 \tl r_1}
       {Left\ v_1 \tl r_1 + r_2}
   \quad\quad
     \inferrule[]
       { v_2 \tl r_2 \\ \abs{v_2}\ \not\in\ L(r_1)}
       {Right\ v_2 \tl r_1 + r_2}
   $$
   $$
   \inferrule
       {v_1 \tl r_1\\
        v_2 \tl r_2\\
        s \in\  L(r_1\backslash\! \abs{v_1}) \ \land\
        \abs{v_2}\!\backslash s\ \epsilon\ L(r_2)
        \ \Rightarrow\ s = []
       }
       {(v_1, v_2) \tl r_1 \cdot r_2}
   $$
   $$\inferrule
         { v \tl r \\ vs \tl r^* \\ \abs{v}\ \not=\ []} 
         { (v :: vs) \tl r^* }
   \quad\quad
       \inferrule{}
         { []  \tl r^* }       
   $$

*}


(*<*)
end
(*>*)
author	Christian Urban <christian dot urban at kcl dot ac dot uk>
	Wed, 02 Mar 2016 04:13:25 +0000
changeset 109	2c38f10643ae
parent 108	73f7dc60c285
child 110	267afb7fb700
permissions	-rw-r--r--