thys3/Paper.thy
changeset 615 8881a09a06fd
parent 599 a5f666410101
child 616 8907d4b6316d
equal deleted inserted replaced
614:d5e9bcb384ec 615:8881a09a06fd
   173 ``aggressive'' simplification method that keeps the size of the
   173 ``aggressive'' simplification method that keeps the size of the
   174 derivatives finitely bounded no matter what the length of the string is.
   174 derivatives finitely bounded no matter what the length of the string is.
   175 They make some informal claims about the correctness and linear behaviour
   175 They make some informal claims about the correctness and linear behaviour
   176 of this version, but do not provide any supporting proof arguments, not
   176 of this version, but do not provide any supporting proof arguments, not
   177 even ``pencil-and-paper'' arguments. They write about their bitcoded
   177 even ``pencil-and-paper'' arguments. They write about their bitcoded
   178 \emph{incremental parsing method} (that is the algorithm to be formalised
   178 \emph{incremental parsing method} (that is the algorithm to be fixed and formalised
   179 in this paper):
   179 in this paper):
   180 %
   180 %
   181 \begin{quote}\it
   181 \begin{quote}\it
   182   ``Correctness Claim: We further claim that the incremental parsing
   182   ``Correctness Claim: We further claim that the incremental parsing
   183   method [..] in combination with the simplification steps [..]
   183   method [..] in combination with the simplification steps [..]
   184   yields POSIX parse trees. We have tested this claim
   184   yields POSIX parse trees. We have tested this claim
   185   extensively [..] but yet
   185   extensively [..] but yet
   186   have to work out all proof details.'' \cite[Page 14]{Sulzmann2014}
   186   have to work out all proof details.'' \cite[Page 14]{Sulzmann2014}
   187 \end{quote}  
   187 \end{quote}  
   188 
   188 
   189 \noindent{}\textbf{Contributions:} We have implemented in Isabelle/HOL
   189 \noindent{}\textbf{Contributions:} We fill this gap by implementing in Isabelle/HOL
   190 the derivative-based lexing algorithm of Sulzmann and Lu
   190 our version of the derivative-based lexing algorithm of Sulzmann and Lu
   191 \cite{Sulzmann2014} where regular expressions are annotated with
   191 \cite{Sulzmann2014} where regular expressions are annotated with
   192 bitsequences. We define the crucial simplification function as a
   192 bitsequences. We define the crucial simplification function as a
   193 recursive function, without the need of a fixpoint operation. One objective of
   193 recursive function, without the need of a fixpoint operation. One objective of
   194 the simplification function is to remove duplicates of regular
   194 the simplification function is to remove duplicates of regular
   195 expressions.  For this Sulzmann and Lu use in their paper the standard
   195 expressions.  For this Sulzmann and Lu use in their paper the standard
   198 reason is that in the bitcoded setting, each copy generally has a
   198 reason is that in the bitcoded setting, each copy generally has a
   199 different bitcode annotation---so @{text nub} would never ``fire''.
   199 different bitcode annotation---so @{text nub} would never ``fire''.
   200 Inspired by Scala's library for lists, we shall instead use a @{text
   200 Inspired by Scala's library for lists, we shall instead use a @{text
   201 distinctWith} function that finds duplicates under an ``erasing'' function
   201 distinctWith} function that finds duplicates under an ``erasing'' function
   202 which deletes bitcodes before comparing regular expressions.
   202 which deletes bitcodes before comparing regular expressions.
   203 We shall also introduce our \emph{own} argument and definitions for
   203 We shall also introduce our \emph{own} arguments and definitions for
   204 establishing the correctness of the bitcoded algorithm when 
   204 establishing the correctness of the bitcoded algorithm when 
   205 simplifications are included. Finally we
   205 simplifications are included. Finally we
   206 establish that the size of derivatives can be finitely bounded.\medskip%\footnote{ In this paper, we shall first briefly introduce the basic notions
   206 establish that the size of derivatives can be finitely bounded.\medskip%\footnote{ In this paper, we shall first briefly introduce the basic notions
   207 %of regular expressions and describe the definition
   207 %of regular expressions and describe the definition
   208 %of POSIX lexing from our earlier work \cite{AusafDyckhoffUrban2016}. This serves
   208 %of POSIX lexing from our earlier work \cite{AusafDyckhoffUrban2016}. This serves
   298   $r\backslash c$ for the derivative of the regular expression $r$
   298   $r\backslash c$ for the derivative of the regular expression $r$
   299   w.r.t.~the character $c$. Both functions are defined by recursion over
   299   w.r.t.~the character $c$. Both functions are defined by recursion over
   300   regular expressions.  
   300   regular expressions.  
   301 %
   301 %
   302 \begin{center}
   302 \begin{center}
   303 \begin{tabular}{c @ {\hspace{-1mm}}c}
   303 \begin{tabular}{c @ {\hspace{-9mm}}c}
   304   \begin{tabular}{r@ {\hspace{2mm}}c@ {\hspace{2mm}}l}
   304   \begin{tabular}{r@ {\hspace{2mm}}c@ {\hspace{2mm}}l}
   305   @{thm (lhs) der.simps(1)} & $\dn$ & @{thm (rhs) der.simps(1)}\\
   305   @{thm (lhs) der.simps(1)} & $\dn$ & @{thm (rhs) der.simps(1)}\\
   306   @{thm (lhs) der.simps(2)} & $\dn$ & @{thm (rhs) der.simps(2)}\\
   306   @{thm (lhs) der.simps(2)} & $\dn$ & @{thm (rhs) der.simps(2)}\\
   307   @{thm (lhs) der.simps(3)} & $\dn$ & @{thm (rhs) der.simps(3)}\\
   307   @{thm (lhs) der.simps(3)} & $\dn$ & @{thm (rhs) der.simps(3)}\\
   308   @{thm (lhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]}\\
   308   @{thm (lhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]}\\
   328 \end{tabular}  
   328 \end{tabular}  
   329 \end{center}
   329 \end{center}
   330 
   330 
   331   \noindent
   331   \noindent
   332   We can extend this definition to give derivatives w.r.t.~strings,
   332   We can extend this definition to give derivatives w.r.t.~strings,
   333   namely @{thm (lhs) ders.simps(1)}$\dn$@{thm (rhs) ders.simps(1)}
   333   namely as @{thm (lhs) ders.simps(1)}$\dn$@{thm (rhs) ders.simps(1)}
   334   and @{thm (lhs) ders.simps(2)}$\dn$@{thm (rhs) ders.simps(2)}.
   334   and @{thm (lhs) ders.simps(2)}$\dn$@{thm (rhs) ders.simps(2)}.
   335   Using @{text nullable} and the derivative operation, we can
   335   Using @{text nullable} and the derivative operation, we can
   336 define a simple regular expression matcher, namely
   336 define a simple regular expression matcher, namely
   337 $@{text "match s r"} \;\dn\; @{term nullable}(r\backslash s)$.
   337 $@{text "match s r"} \dn @{term nullable}(r\backslash s)$.
   338 This is essentially Brzozowski's algorithm from 1964. Its
   338 This is essentially Brzozowski's algorithm from 1964. Its
   339 main virtue is that the algorithm can be easily implemented as a
   339 main virtue is that the algorithm can be easily implemented as a
   340 functional program (either in a functional programming language or in
   340 functional program (either in a functional programming language or in
   341 a theorem prover). The correctness of @{text match} amounts to
   341 a theorem prover). The correctness of @{text match} amounts to
   342 establishing the property:%\footnote{It is a fun exercise to formally prove this property in a theorem prover.}
   342 establishing the property:%\footnote{It is a fun exercise to formally prove this property in a theorem prover.}
   355 Also Ribeiro and Du Bois give one in Agda~\cite{RibeiroAgda2017}.
   355 Also Ribeiro and Du Bois give one in Agda~\cite{RibeiroAgda2017}.
   356 
   356 
   357 The novel idea of Sulzmann and Lu is to extend this algorithm for 
   357 The novel idea of Sulzmann and Lu is to extend this algorithm for 
   358 lexing, where it is important to find out which part of the string
   358 lexing, where it is important to find out which part of the string
   359 is matched by which part of the regular expression.
   359 is matched by which part of the regular expression.
   360 For this Sulzmann and Lu presented two lexing algorithms in their paper
   360 For this Sulzmann and Lu presented two lexing algorithms in their
   361   \cite{Sulzmann2014}. The first algorithm consists of two phases: first a
   361 paper~\cite{Sulzmann2014}. The first algorithm consists of two phases: first a
   362   matching phase (which is Brzozowski's algorithm) and then a value
   362   matching phase (which is Brzozowski's algorithm) and then a value
   363   construction phase. The values encode \emph{how} a regular expression
   363   construction phase. The values encode \emph{how} a regular expression
   364   matches a string. \emph{Values} are defined as the inductive datatype
   364   matches a string. \emph{Values} are defined as the inductive datatype
   365   %
   365   %
   366   \begin{center}
   366   \begin{center}
   382   Sulzmann and Lu also define inductively an
   382   Sulzmann and Lu also define inductively an
   383   inhabitation relation that associates values to regular expressions. Our
   383   inhabitation relation that associates values to regular expressions. Our
   384   version of this relation is defined the following six rules for the values:
   384   version of this relation is defined the following six rules for the values:
   385   %
   385   %
   386   \begin{center}
   386   \begin{center}
   387   \begin{tabular}{@ {}c@ {}}
   387   \begin{tabular}{@ {}l@ {}}
   388   @{thm[mode=Axiom] Prf.intros(4)}\qquad
   388   @{thm[mode=Axiom] Prf.intros(4)}\quad
       
   389   @{thm[mode=Rule] Prf.intros(2)[of "v\<^sub>1" "r\<^sub>1" "r\<^sub>2"]}\quad
       
   390   @{thm[mode=Rule] Prf.intros(3)[of "v\<^sub>2" "r\<^sub>2" "r\<^sub>1"]}\quad
       
   391   @{thm[mode=Rule] Prf.intros(1)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>2" "r\<^sub>2"]}\medskip\\
   389   @{thm[mode=Axiom] Prf.intros(5)[of "c"]}\qquad
   392   @{thm[mode=Axiom] Prf.intros(5)[of "c"]}\qquad
   390   @{thm[mode=Rule] Prf.intros(2)[of "v\<^sub>1" "r\<^sub>1" "r\<^sub>2"]}\qquad
       
   391   @{thm[mode=Rule] Prf.intros(3)[of "v\<^sub>2" "r\<^sub>2" "r\<^sub>1"]}\qquad
       
   392   @{thm[mode=Rule] Prf.intros(1)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>2" "r\<^sub>2"]}\medskip\\
       
   393   @{thm[mode=Rule] Prf.intros(6)[of "vs" "r"]}\qquad
   393   @{thm[mode=Rule] Prf.intros(6)[of "vs" "r"]}\qquad
   394   $\mprset{flushleft}\inferrule{
   394   $\mprset{flushleft}\inferrule{
   395   @{thm (prem 1) Prf.intros(7)[of "vs\<^sub>1" "r"  "vs\<^sub>2" "n"]}\\\\
   395   @{thm (prem 1) Prf.intros(7)[of "vs\<^sub>1" "r"  "vs\<^sub>2" "n"]}\\\\
   396   @{thm (prem 2) Prf.intros(7)[of "vs\<^sub>1" "r"  "vs\<^sub>2" "n"]}\quad
   396   @{thm (prem 2) Prf.intros(7)[of "vs\<^sub>1" "r"  "vs\<^sub>2" "n"]}\\\\
   397   @{thm (prem 3) Prf.intros(7)[of "vs\<^sub>1" "r"  "vs\<^sub>2" "n"]}
   397   @{thm (prem 3) Prf.intros(7)[of "vs\<^sub>1" "r"  "vs\<^sub>2" "n"]}
   398   }
   398   }
   399   {@{thm (concl) Prf.intros(7)[of "vs\<^sub>1" "r"  "vs\<^sub>2" "n"]}}
   399   {@{thm (concl) Prf.intros(7)[of "vs\<^sub>1" "r"  "vs\<^sub>2" "n"]}}
   400   $
   400   $
   401   \end{tabular}
   401   \end{tabular}
   486   @{thm (lhs) mkeps.simps(1)} & $\dn$ & @{thm (rhs) mkeps.simps(1)}\\
   486   @{thm (lhs) mkeps.simps(1)} & $\dn$ & @{thm (rhs) mkeps.simps(1)}\\
   487   @{thm (lhs) mkeps.simps(2)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) mkeps.simps(2)[of "r\<^sub>1" "r\<^sub>2"]}\\
   487   @{thm (lhs) mkeps.simps(2)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) mkeps.simps(2)[of "r\<^sub>1" "r\<^sub>2"]}\\
   488   @{thm (lhs) mkeps.simps(3)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) mkeps.simps(3)[of "r\<^sub>1" "r\<^sub>2"]}\\
   488   @{thm (lhs) mkeps.simps(3)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) mkeps.simps(3)[of "r\<^sub>1" "r\<^sub>2"]}\\
   489   @{thm (lhs) mkeps.simps(4)} & $\dn$ & @{thm (rhs) mkeps.simps(4)}\\
   489   @{thm (lhs) mkeps.simps(4)} & $\dn$ & @{thm (rhs) mkeps.simps(4)}\\
   490   @{thm (lhs) mkeps.simps(5)} & $\dn$ & @{thm (rhs) mkeps.simps(5)}\\
   490   @{thm (lhs) mkeps.simps(5)} & $\dn$ & @{thm (rhs) mkeps.simps(5)}\\
   491   \end{tabular}\medskip\\
   491   \end{tabular}
   492 
   492   \medskip\\
   493   \begin{tabular}{@ {}cc@ {}}
   493 
       
   494   %!\begin{tabular}{@ {}cc@ {}}
   494   \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}@ {}}
   495   \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}@ {}}
   495   @{thm (lhs) injval.simps(1)} & $\dn$ & @{thm (rhs) injval.simps(1)}\\
   496   @{thm (lhs) injval.simps(1)} & $\dn$ & @{thm (rhs) injval.simps(1)}\\
   496   @{thm (lhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]} & $\dn$ & 
   497   @{thm (lhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]} & $\dn$ & 
   497       @{thm (rhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]}\\
   498       @{thm (rhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]}\\
   498   @{thm (lhs) injval.simps(3)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]} & $\dn$ & 
   499   @{thm (lhs) injval.simps(3)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]} & $\dn$ & 
   499       @{thm (rhs) injval.simps(3)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]}\\
   500       @{thm (rhs) injval.simps(3)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]}\\
       
   501   %!
       
   502   @{thm (lhs) injval.simps(4)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]} & $\dn$ 
       
   503       & @{thm (rhs) injval.simps(4)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]}\\
       
   504   @{thm (lhs) injval.simps(5)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]} & $\dn$ 
       
   505       & @{thm (rhs) injval.simps(5)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]}\\
       
   506   @{thm (lhs) injval.simps(6)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]} & $\dn$ &
       
   507   @{thm (rhs) injval.simps(6)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]}\\
       
   508   %!
   500   @{thm (lhs) injval.simps(7)[of "r" "c" "v" "vs"]} & $\dn$ 
   509   @{thm (lhs) injval.simps(7)[of "r" "c" "v" "vs"]} & $\dn$ 
   501       & @{thm (rhs) injval.simps(7)[of "r" "c" "v" "vs"]}\\
   510       & @{thm (rhs) injval.simps(7)[of "r" "c" "v" "vs"]}\\
   502   @{thm (lhs) injval.simps(8)[of "r" "n" "c" "v" "vs"]} & $\dn$ 
   511   @{thm (lhs) injval.simps(8)[of "r" "n" "c" "v" "vs"]} & $\dn$ 
   503       & @{thm (rhs) injval.simps(8)[of "r" "n" "c" "v" "vs"]}
   512       & @{thm (rhs) injval.simps(8)[of "r" "n" "c" "v" "vs"]}
   504   \end{tabular}
   513   \end{tabular}
   505   &
   514   %!&
   506   \begin{tabular}{@ {\hspace{-3mm}}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}@ {}}
   515   %!\begin{tabular}{@ {\hspace{-3mm}}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}@ {}}
   507   @{thm (lhs) injval.simps(4)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]} & $\dn$ 
   516   
   508       & @{thm (rhs) injval.simps(4)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]}\\
   517   %!\end{tabular}
   509   @{thm (lhs) injval.simps(5)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]} & $\dn$ 
   518   %!\end{tabular}
   510       & @{thm (rhs) injval.simps(5)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]}\\
   519   \end{tabular}%!\smallskip
   511   @{thm (lhs) injval.simps(6)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]} & $\dn$\\ 
       
   512   \multicolumn{3}{@ {}r@ {}}{@{thm (rhs) injval.simps(6)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]}}\\ \mbox{}\\ \mbox{}
       
   513   \end{tabular}
       
   514   \end{tabular}
       
   515   \end{tabular}\smallskip
       
   516   \end{center}
   520   \end{center}
   517 
   521 
   518   \noindent
   522   \noindent
   519   The function @{text mkeps} is run when the last derivative is nullable, that is
   523   The function @{text mkeps} is run when the last derivative is nullable, that is
   520   the string to be matched is in the language of the regular expression. It generates
   524   the string to be matched is in the language of the regular expression. It generates
   603   \begin{center}
   607   \begin{center}
   604   \begin{tabular}{lcl}
   608   \begin{tabular}{lcl}
   605   @{thm (lhs) lexer.simps(1)} & $\dn$ & @{thm (rhs) lexer.simps(1)}\\
   609   @{thm (lhs) lexer.simps(1)} & $\dn$ & @{thm (rhs) lexer.simps(1)}\\
   606   @{thm (lhs) lexer.simps(2)} & $\dn$ & @{text "case"} @{term "lexer (der c r) s"} @{text of}
   610   @{thm (lhs) lexer.simps(2)} & $\dn$ & @{text "case"} @{term "lexer (der c r) s"} @{text of}
   607                      @{term "None"}  @{text "\<Rightarrow>"} @{term None}\\
   611                      @{term "None"}  @{text "\<Rightarrow>"} @{term None}\\
   608                      & & \hspace{24mm}$|\;$ @{term "Some v"} @{text "\<Rightarrow>"} @{term "Some (injval r c v)"}                          
   612                      & & \hspace{27mm}$|\;$ @{term "Some v"} @{text "\<Rightarrow>"} @{term "Some (injval r c v)"}                          
   609   \end{tabular}
   613   \end{tabular}
   610   \end{center}
   614   \end{center}
   611 
   615 
   612 
   616 
   613 We have shown in our earlier paper \cite{AusafDyckhoffUrban2016} that
   617 We have shown in our earlier paper \cite{AusafDyckhoffUrban2016} that
   622   \noindent
   626   \noindent
   623   With this in place we were able to prove:
   627   With this in place we were able to prove:
   624 
   628 
   625   \begin{proposition}\mbox{}\label{lexercorrect}
   629   \begin{proposition}\mbox{}\label{lexercorrect}
   626   \textrm{(1)} @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}.\\
   630   \textrm{(1)} @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}.\\
   627   \mbox{\hspace{29mm}}\textrm{(2)}\; @{thm (lhs) lexer_correct_Some} if and only if @{thm (rhs) lexer_correct_Some}.
   631   \mbox{\hspace{23.5mm}}\textrm{(2)}\; @{thm (lhs) lexer_correct_Some} if and only if @{thm (rhs) lexer_correct_Some}.
   628   %
   632   %
   629   % \smallskip\\
   633   % \smallskip\\
   630   %\begin{tabular}{ll}
   634   %\begin{tabular}{ll}
   631   %(1) & @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}\\
   635   %(1) & @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}\\
   632   %(2) & @{thm (lhs) lexer_correct_Some} if and only if @{thm (rhs) lexer_correct_Some}\\
   636   %(2) & @{thm (lhs) lexer_correct_Some} if and only if @{thm (rhs) lexer_correct_Some}\\
   649 
   653 
   650   In the second part of their paper \cite{Sulzmann2014},
   654   In the second part of their paper \cite{Sulzmann2014},
   651   Sulzmann and Lu describe another algorithm that also generates POSIX
   655   Sulzmann and Lu describe another algorithm that also generates POSIX
   652   values but dispenses with the second phase where characters are
   656   values but dispenses with the second phase where characters are
   653   injected ``back'' into values. For this they annotate bitcodes to
   657   injected ``back'' into values. For this they annotate bitcodes to
   654   regular expressions, which we define in Isabelle/HOL as the datatype
   658   regular expressions, which we define in Isabelle/HOL as the datatype\medskip
   655 
   659 
   656   \begin{center}
   660   %!\begin{center}
   657   @{term breg} $\;::=\;$ @{term "AZERO"}
   661   \noindent
   658                $\;\mid\;$ @{term "AONE bs"}
   662   \begin{minipage}{1.01\textwidth}
   659                $\;\mid\;$ @{term "ACHAR bs c"}
   663   \,@{term breg} $\,::=\,$ @{term "AZERO"}
   660                $\;\mid\;$ @{term "AALTs bs rs"}
   664                $\,\mid$ @{term "AONE bs"}
   661                $\;\mid\;$ @{term "ASEQ bs r\<^sub>1 r\<^sub>2"}
   665                $\,\mid$ @{term "ACHAR bs c"}
   662                $\;\mid\;$ @{term "ASTAR bs r"}
   666                $\,\mid$ @{term "AALTs bs rs"}
   663 	       $\;\mid\;$ @{term "ANTIMES bs r n"}
   667                $\,\mid$ @{term "ASEQ bs r\<^sub>1 r\<^sub>2"}
   664   \end{center}
   668                $\,\mid$ @{term "ASTAR bs r"}
       
   669 	       $\,\mid$ \mbox{@{term "ANTIMES bs r n"}}\hfill\mbox{}
       
   670   \end{minipage}\medskip	       
       
   671   %!\end{center}
   665 
   672 
   666   \noindent where @{text bs} stands for bitsequences; @{text r},
   673   \noindent where @{text bs} stands for bitsequences; @{text r},
   667   @{text "r\<^sub>1"} and @{text "r\<^sub>2"} for bitcoded regular
   674   @{text "r\<^sub>1"} and @{text "r\<^sub>2"} for bitcoded regular
   668   expressions; and @{text rs} for lists of bitcoded regular
   675   expressions; and @{text rs} for lists of bitcoded regular
   669   expressions. The binary alternative @{text "ALT bs r\<^sub>1 r\<^sub>2"}
   676   expressions. The binary alternative @{text "ALT bs r\<^sub>1 r\<^sub>2"}
   675   Sulzmann and Lu follow Nielsen and Henglein \cite{NielsenHenglein2011}
   682   Sulzmann and Lu follow Nielsen and Henglein \cite{NielsenHenglein2011}
   676   and define a coding
   683   and define a coding
   677   function for how values can be coded into bitsequences.
   684   function for how values can be coded into bitsequences.
   678 
   685 
   679   \begin{center}
   686   \begin{center}
   680   \begin{tabular}{cc}
   687   \begin{tabular}{c@ {\hspace{5mm}}c}
   681   \begin{tabular}{lcl}
   688   \begin{tabular}{lcl}
   682   @{thm (lhs) code.simps(1)} & $\dn$ & @{thm (rhs) code.simps(1)}\\
   689   @{thm (lhs) code.simps(1)} & $\dn$ & @{thm (rhs) code.simps(1)}\\
   683   @{thm (lhs) code.simps(2)} & $\dn$ & @{thm (rhs) code.simps(2)}\\
   690   @{thm (lhs) code.simps(2)} & $\dn$ & @{thm (rhs) code.simps(2)}\\
   684   @{thm (lhs) code.simps(3)} & $\dn$ & @{thm (rhs) code.simps(3)}\\
   691   @{thm (lhs) code.simps(3)} & $\dn$ & @{thm (rhs) code.simps(3)}\\
   685   @{thm (lhs) code.simps(4)} & $\dn$ & @{thm (rhs) code.simps(4)}
   692   @{thm (lhs) code.simps(4)} & $\dn$ & @{thm (rhs) code.simps(4)}
   702   @{text S} followed by some bitsequence. Similarly, we use @{text Z} to indicate
   709   @{text S} followed by some bitsequence. Similarly, we use @{text Z} to indicate
   703   if there is still a value coming in the list of @{text Stars}, whereas @{text S}
   710   if there is still a value coming in the list of @{text Stars}, whereas @{text S}
   704   indicates the end of the list. The lossiness makes the process of
   711   indicates the end of the list. The lossiness makes the process of
   705   decoding a bit more involved, but the point is that if we have a
   712   decoding a bit more involved, but the point is that if we have a
   706   regular expression \emph{and} a bitsequence of a corresponding value,
   713   regular expression \emph{and} a bitsequence of a corresponding value,
   707   then we can always decode the value accurately (see Fig.~\ref{decode}).
   714   then we can always decode the value accurately~(see Fig.~\ref{decode}).
   708   The function \textit{decode} checks whether all of the bitsequence is
   715   The function \textit{decode} checks whether all of the bitsequence is
   709   consumed and returns the corresponding value as @{term "Some v"}; otherwise
   716   consumed and returns the corresponding value as @{term "Some v"}; otherwise
   710   it fails with @{text "None"}. We can establish that for a value $v$
   717   it fails with @{text "None"}. We can establish that for a value $v$
   711   inhabited by a regular expression $r$, the decoding of its
   718   inhabited by a regular expression $r$, the decoding of its
   712   bitsequence never fails (see also \cite{NielsenHenglein2011}).
   719   bitsequence never fails (see also \cite{NielsenHenglein2011}).
   713 
   720 
   714   %The decoding can be
   721   %The decoding can be
   715   %defined by using two functions called $\textit{decode}'$ and
   722   %defined by using two functions called $\textit{decode}'$ and
   716   %\textit{decode}:
   723   %\textit{decode}:
   717 
   724 
   718   \begin{figure}
   725   \begin{figure}[t]
   719   \begin{center}
   726   \begin{center}
   720   \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}}
   727   \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}}
   721   $\textit{decode}'\,bs\,(\ONE)$ & $\;\dn\;$ & $(\Empty, bs)$\\
   728   $\textit{decode}'\,bs\,(\ONE)$ & $\;\dn\;$ & $(\Empty, bs)$\\
   722   $\textit{decode}'\,bs\,(c)$ & $\;\dn\;$ & $(\Char\,c, bs)$\\
   729   $\textit{decode}'\,bs\,(c)$ & $\;\dn\;$ & $(\Char\,c, bs)$\\
   723   $\textit{decode}'\,(\Z\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
   730   $\textit{decode}'\,(\Z\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
   733   $\textit{decode}'\,(\S\!::\!bs)\,(r^*)$ & $\dn$ & $(\Stars\,[], bs)$\\
   740   $\textit{decode}'\,(\S\!::\!bs)\,(r^*)$ & $\dn$ & $(\Stars\,[], bs)$\\
   734   $\textit{decode}'\,(\Z\!::\!bs)\,(r^*)$ & $\dn$ & 
   741   $\textit{decode}'\,(\Z\!::\!bs)\,(r^*)$ & $\dn$ & 
   735          $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r\;\textit{in}$\\
   742          $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r\;\textit{in}$\\
   736   & &   $\textit{let}\,(\Stars\,vs, bs_2) = \textit{decode}'\,bs_1\,r^*$
   743   & &   $\textit{let}\,(\Stars\,vs, bs_2) = \textit{decode}'\,bs_1\,r^*$
   737         \hspace{2mm}$\textit{in}\;(\Stars\,v\!::\!vs, bs_2)$\\
   744         \hspace{2mm}$\textit{in}\;(\Stars\,v\!::\!vs, bs_2)$\\
   738   $\textit{decode}'\,bs\,(r^{\{n\}})$ & $\dn$ & $\textit{decode}'\,bs\,r^*$\smallskip\medskip\\
   745   $\textit{decode}'\,bs\,(r^{\{n\}})$ & $\dn$ & $\textit{decode}'\,bs\,r^*$\smallskip\smallskip\\
   739 
   746 
   740   $\textit{decode}\,bs\,r$ & $\dn$ & 
   747   $\textit{decode}\,bs\,r$ & $\dn$ & 
   741      $\textit{let}\,(v, bs') = \textit{decode}'\,bs\,r\;\textit{in}$\\
   748      $\textit{let}\,(v, bs') = \textit{decode}'\,bs\,r\;\textit{in}$\\
   742   & & $\;\;\;\,\textit{if}\;bs' = []\;\textit{then}\;\textit{Some}\,v\;\textit{else}\;\textit{None}$\\[-4mm]   
   749   & & $\;\;\;\,\textit{if}\;bs' = []\;\textit{then}\;\textit{Some}\,v\;\textit{else}\;\textit{None}$\\[-5mm]   
   743   \end{tabular}    
   750   \end{tabular}    
   744   \end{center}
   751   \end{center}
   745   \caption{Two functions, called $\textit{decode}'$ and \textit{decode}, for decoding a value from a bitsequence with the help of a regular expression.\\[-5mm]}\label{decode}
   752   \caption{Two functions, called $\textit{decode}'$ and \textit{decode}, for decoding a value from a bitsequence with the help of a regular expression.\\[-5mm]}\label{decode}
   746   \end{figure}
   753   \end{figure}
   747 
   754 
   761 
   768 
   762   Sulzmann and Lu define the function \emph{internalise}
   769   Sulzmann and Lu define the function \emph{internalise}
   763   in order to transform (standard) regular expressions into annotated
   770   in order to transform (standard) regular expressions into annotated
   764   regular expressions. We write this operation as $r^\uparrow$.
   771   regular expressions. We write this operation as $r^\uparrow$.
   765   This internalisation uses the following
   772   This internalisation uses the following
   766   \emph{fuse} function.	     
   773   \emph{fuse} function.\medskip
       
   774   
   767   %
   775   %
   768   \begin{center}
   776   %!\begin{center}
   769   \begin{tabular}{@ {}cc@ {}}
   777   \noindent\begin{minipage}{\textwidth}
       
   778   \begin{tabular}{@ {}c@ {\hspace{3mm}}c@ {}}
   770   \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
   779   \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
   771   $\textit{fuse}\,bs\,(\textit{ZERO})$ & $\dn$ & $\textit{ZERO}$\\
   780   $\textit{fuse}\,bs\,(\textit{ZERO})$ & $\dn$ & $\textit{ZERO}$\\
   772   $\textit{fuse}\,bs\,(\textit{ONE}\,bs')$ & $\dn$ &
   781   $\textit{fuse}\,bs\,(\textit{ONE}\,bs')$ & $\dn$ &
   773      $\textit{ONE}\,(bs\,@\,bs')$\\
   782      $\textit{ONE}\,(bs\,@\,bs')$\\
   774   $\textit{fuse}\,bs\,(\textit{CHAR}\,bs'\,c)$ & $\dn$ &
   783   $\textit{fuse}\,bs\,(\textit{CHAR}\,bs'\,c)$ & $\dn$ &
   784      $\textit{STAR}\,(bs\,@\,bs')\,r$\\
   793      $\textit{STAR}\,(bs\,@\,bs')\,r$\\
   785   $\textit{fuse}\,bs\,(\textit{NT}\,bs'\,r\,n)$ & $\dn$ &
   794   $\textit{fuse}\,bs\,(\textit{NT}\,bs'\,r\,n)$ & $\dn$ &
   786      $\textit{NT}\,(bs\,@\,bs')\,r\,n$   
   795      $\textit{NT}\,(bs\,@\,bs')\,r\,n$   
   787   \end{tabular}
   796   \end{tabular}
   788   \end{tabular}
   797   \end{tabular}
   789   \end{center}    
   798   \end{minipage}\medskip
       
   799   %!\end{center}    
   790 
   800 
   791   \noindent
   801   \noindent
   792   A regular expression can then be \emph{internalised} into a bitcoded
   802   A regular expression can then be \emph{internalised} into a bitcoded
   793   regular expression as follows:
   803   regular expression as follows:
   794   %
   804   %
       
   805   %!\begin{center}
       
   806   %!\begin{tabular}{@ {}c@ {\hspace{2mm}}c@ {\hspace{2mm}}c@ {}}
       
   807   %!\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
       
   808   %!$(\ZERO)^\uparrow$ & $\dn$ & $\textit{ZERO}$\\
       
   809   %!$(\ONE)^\uparrow$ & $\dn$ & $\textit{ONE}\,[]$
       
   810   %!\end{tabular}
       
   811   %!&
       
   812   %!\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
       
   813   %!$(c)^\uparrow$ & $\dn$ & $\textit{CHAR}\,[]\,c$\\
       
   814   %!$(r^*)^\uparrow$ & $\dn$ & $\textit{STAR}\;[]\,r^\uparrow$\\
       
   815   %!$(r^{\{n\}})^\uparrow$ & $\dn$ &
       
   816   %!      $\textit{NT}\;[]\,r^\uparrow\,n$
       
   817   %!\end{tabular}
       
   818   %!&
       
   819   %!\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
       
   820   %!$(r_1 + r_2)^\uparrow$ & $\dn$ &
       
   821   %!       $\textit{ALT}\;[]\,(\textit{fuse}\,[\Z]\,r_1^\uparrow)\,
       
   822   %!                          (\textit{fuse}\,[\S]\,r_2^\uparrow)$\\
       
   823   %!$(r_1\cdot r_2)^\uparrow$ & $\dn$ &
       
   824   %!       $\textit{SEQ}\;[]\,r_1^\uparrow\,r_2^\uparrow$
       
   825   %!\end{tabular}
       
   826   %!\end{tabular}
       
   827   %!\end{center}
       
   828   %!
   795   \begin{center}
   829   \begin{center}
   796   \begin{tabular}{@ {}ccc@ {}}
   830   \begin{tabular}{@ {}c@ {\hspace{7mm}}c@ {}}
   797   \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
   831   \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
   798   $(\ZERO)^\uparrow$ & $\dn$ & $\textit{ZERO}$\\
   832   $(\ZERO)^\uparrow$ & $\dn$ & $\textit{ZERO}$\\
   799   $(\ONE)^\uparrow$ & $\dn$ & $\textit{ONE}\,[]$
   833   $(\ONE)^\uparrow$ & $\dn$ & $\textit{ONE}\,[]$\\
   800   \end{tabular}
       
   801   &
       
   802   \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
       
   803   $(c)^\uparrow$ & $\dn$ & $\textit{CHAR}\,[]\,c$\\
   834   $(c)^\uparrow$ & $\dn$ & $\textit{CHAR}\,[]\,c$\\
   804   $(r^*)^\uparrow$ & $\dn$ & $\textit{STAR}\;[]\,r^\uparrow$\\
   835   $(r^*)^\uparrow$ & $\dn$ & $\textit{STAR}\;[]\,r^\uparrow$\\
   805   $(r^{\{n\}})^\uparrow$ & $\dn$ &
       
   806         $\textit{NT}\;[]\,r^\uparrow\,n$
       
   807   \end{tabular}
   836   \end{tabular}
   808   &
   837   &
   809   \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
   838   \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
   810   $(r_1 + r_2)^\uparrow$ & $\dn$ &
   839   $(r_1 + r_2)^\uparrow$ & $\dn$ &
   811          $\textit{ALT}\;[]\,(\textit{fuse}\,[\Z]\,r_1^\uparrow)\,
   840          $\textit{ALT}\;[]\,(\textit{fuse}\,[\Z]\,r_1^\uparrow)\,
   812                             (\textit{fuse}\,[\S]\,r_2^\uparrow)$\\
   841                             (\textit{fuse}\,[\S]\,r_2^\uparrow)$\\
   813   $(r_1\cdot r_2)^\uparrow$ & $\dn$ &
   842   $(r_1\cdot r_2)^\uparrow$ & $\dn$ &
   814          $\textit{SEQ}\;[]\,r_1^\uparrow\,r_2^\uparrow$
   843          $\textit{SEQ}\;[]\,r_1^\uparrow\,r_2^\uparrow$\\
       
   844   $(r^{\{n\}})^\uparrow$ & $\dn$ &
       
   845         $\textit{NT}\;[]\,r^\uparrow\,n$	 
   815   \end{tabular}
   846   \end{tabular}
   816   \end{tabular}
   847   \end{tabular}
   817   \end{center}    
   848   \end{center}    
   818 
   849 
   819   \noindent
   850   \noindent
   824   the functions \textit{bnullable} and \textit{bmkeps}(\textit{s}), which are the
   855   the functions \textit{bnullable} and \textit{bmkeps}(\textit{s}), which are the
   825   ``lifted'' versions of \textit{nullable} and \textit{mkeps} acting on
   856   ``lifted'' versions of \textit{nullable} and \textit{mkeps} acting on
   826   bitcoded regular expressions.
   857   bitcoded regular expressions.
   827   %
   858   %
   828   \begin{center}
   859   \begin{center}
   829   \begin{tabular}{@ {\hspace{-1mm}}c@ {\hspace{0mm}}c@ {}}
   860   \begin{tabular}{@ {\hspace{-1mm}}c@ {\hspace{6mm}}c@ {}}
   830   \begin{tabular}{@ {}l@ {\hspace{0.5mm}}c@ {\hspace{1mm}}l}
   861   \begin{tabular}{@ {}l@ {\hspace{0.5mm}}c@ {\hspace{1mm}}l}
   831   $\textit{bnullable}\,(\textit{ZERO})$ & $\dn$ & $\textit{False}$\\
   862   $\textit{bnullable}\,(\textit{ZERO})$ & $\dn$ & $\textit{False}$\\
   832   $\textit{bnullable}\,(\textit{ONE}\,bs)$ & $\dn$ & $\textit{True}$\\
   863   $\textit{bnullable}\,(\textit{ONE}\,bs)$ & $\dn$ & $\textit{True}$\\
   833   $\textit{bnullable}\,(\textit{CHAR}\,bs\,c)$ & $\dn$ & $\textit{False}$\\
   864   $\textit{bnullable}\,(\textit{CHAR}\,bs\,c)$ & $\dn$ & $\textit{False}$\\
   834   $\textit{bnullable}\,(\textit{ALTs}\,bs\,\rs)$ & $\dn$ &
   865   $\textit{bnullable}\,(\textit{ALTs}\,bs\,\rs)$ & $\dn$ &\\
   835      $\exists\, r \in \rs. \,\textit{bnullable}\,r$\\
   866   \multicolumn{3}{l}{$\quad\exists\, r \in \rs. \,\textit{bnullable}\,r$}\\
   836   $\textit{bnullable}\,(\textit{SEQ}\,bs\,r_1\,r_2)$ & $\dn$ &
   867   $\textit{bnullable}\,(\textit{SEQ}\,bs\,r_1\,r_2)$ & $\dn$ &\\
   837      $\textit{bnullable}\,r_1\wedge \textit{bnullable}\,r_2$\\
   868   \multicolumn{3}{l}{$\quad\textit{bnullable}\,r_1\wedge \textit{bnullable}\,r_2$}\\
   838   $\textit{bnullable}\,(\textit{STAR}\,bs\,r)$ & $\dn$ &
   869   $\textit{bnullable}\,(\textit{STAR}\,bs\,r)$ & $\dn$ &
   839      $\textit{True}$\\
   870      $\textit{True}$\\
   840   $\textit{bnullable}\,(\textit{NT}\,bs\,r\,n)$ & $\dn$ &\\
   871   $\textit{bnullable}\,(\textit{NT}\,bs\,r\,n)$ & $\dn$ &\\
   841   \multicolumn{3}{l}{$\textit{if}\;n = 0\;\textit{then}\;\textit{True}\;
   872   \multicolumn{3}{l}{$\quad\textit{if}\;n = 0\;\textit{then}\;\textit{True}\;
   842   \textit{else}\;\textit{bnullable}\,r$}\\
   873   \textit{else}\;\textit{bnullable}\,r$}\\
   843   \end{tabular}
   874   \end{tabular}
   844   &
   875   &
   845   \begin{tabular}{@ {}l@ {\hspace{0.5mm}}c@ {\hspace{1mm}}l@ {}}
   876   \begin{tabular}{@ {}l@ {\hspace{0.5mm}}c@ {\hspace{1mm}}l@ {}}
   846   $\textit{bmkeps}\,(\textit{ONE}\,bs)$ & $\dn$ & $bs$\\
   877   $\textit{bmkeps}\,(\textit{ONE}\,bs)$ & $\dn$ & $bs$\\
   847   $\textit{bmkeps}\,(\textit{ALTs}\,bs\,\rs)$ & $\dn$ &
   878   $\textit{bmkeps}\,(\textit{ALTs}\,bs\,\rs)$ & $\dn$ &
   848   $bs\,@\,\textit{bmkepss}\,\rs$\\
   879   $bs\,@\,\textit{bmkepss}\,\rs$\\
   849   $\textit{bmkeps}\,(\textit{SEQ}\,bs\,r_1\,r_2)$ & $\dn$ &\\
   880   $\textit{bmkeps}\,(\textit{SEQ}\,bs\,r_1\,r_2)$ & $\dn$ &\\
   850   \multicolumn{3}{l}{$bs \,@\,\textit{bmkeps}\,r_1\,@\, \textit{bmkeps}\,r_2$}\\
   881   \multicolumn{3}{l}{$\quad{}bs \,@\,\textit{bmkeps}\,r_1\,@\, \textit{bmkeps}\,r_2$}\\
   851   $\textit{bmkeps}\,(\textit{STAR}\,bs\,r)$ & $\dn$ &
   882   $\textit{bmkeps}\,(\textit{STAR}\,bs\,r)$ & $\dn$ &
   852      $bs \,@\, [\S]$\\
   883      $bs \,@\, [\S]$\\
   853    $\textit{bmkeps}\,(\textit{NT}\,bs\,r\,n)$ & $\dn$ &\\
   884    $\textit{bmkeps}\,(\textit{NT}\,bs\,r\,n)$ & $\dn$ &\\
   854    \multicolumn{3}{l@ {}}{$\textit{if}\;n=0\;\textit{then}\;bs \,@\, [\S]$}\\
   885    \multicolumn{3}{l@ {}}{$\quad\textit{if}\;n=0\;\textit{then}\;bs \,@\, [\S]$}\\
   855    \multicolumn{3}{l@ {}}{$\textit{else}\,bs \,@\, [\Z] \,@\, \textit{bmkeps}\,r\,@\,
   886    \multicolumn{3}{l@ {}}{$\quad\textit{else}\,bs \,@\, [\Z] \,@\, \textit{bmkeps}\,r\,@\,
   856         \textit{bmkeps}\,(@{term "ANTIMES [] r (n - 1)"})$}\\   
   887         \textit{bmkeps}\,(@{term "ANTIMES [] r (n - 1)"})$}\\   
   857   $\textit{bmkepss}\,(r\!::\!\rs)$ & $\dn$ &\\
   888   $\textit{bmkepss}\,(r\!::\!\rs)$ & $\dn$ &\\
   858   \multicolumn{3}{l}{$\textit{if}\;\textit{bnullable}\,r\,\textit{then}\;\textit{bmkeps}\,r\;
   889   \multicolumn{3}{l}{$\quad\textit{if}\;\textit{bnullable}\,r\,\textit{then}\;\textit{bmkeps}\,r\;
   859   \textit{else}\;\textit{bmkepss}\,\rs$}
   890   \textit{else}\;\textit{bmkepss}\,\rs$}
   860   \end{tabular}
   891   \end{tabular}
   861   \end{tabular}
   892   \end{tabular}
   862   \end{center}    
   893   \end{center}    
   863  
   894  
   867   bitcoded regular expression. This derivative function calculates the
   898   bitcoded regular expression. This derivative function calculates the
   868   derivative but at the same time also the incremental part of the bitsequences
   899   derivative but at the same time also the incremental part of the bitsequences
   869   that contribute to constructing a POSIX value.	
   900   that contribute to constructing a POSIX value.	
   870   % 
   901   % 
   871   \begin{center}
   902   \begin{center}
   872   \begin{tabular}{@ {}lcl@ {}}
   903   \begin{tabular}{@ {}l@ {\hspace{-1mm}}cl@ {}}
   873   $(\textit{ZERO})\backslash c$ & $\;\dn\;$ & $\textit{ZERO}$\\ 
   904   $(\textit{ZERO})\backslash c$ & $\;\dn\;$ & $\textit{ZERO}$\\ 
   874   $(\textit{ONE}\;bs)\backslash c$ & $\;\dn\;$ & $\textit{ZERO}$\\  
   905   $(\textit{ONE}\;bs)\backslash c$ & $\;\dn\;$ & $\textit{ZERO}$\\  
   875   $(\textit{CHAR}\;bs\,d)\backslash c$ & $\dn$ &
   906   $(\textit{CHAR}\;bs\,d)\backslash c$ & $\dn$ &
   876         $\textit{if}\;c=d\; \;\textit{then}\;
   907         $\textit{if}\;c=d\; \;\textit{then}\;
   877          \textit{ONE}\;bs\;\textit{else}\;\textit{ZERO}$\\  
   908          \textit{ONE}\;bs\;\textit{else}\;\textit{ZERO}$\\  
   899   can define Sulzmann and Lu's bitcoded lexer, which we call \textit{blexer}:
   930   can define Sulzmann and Lu's bitcoded lexer, which we call \textit{blexer}:
   900   %
   931   %
   901   \begin{center}
   932   \begin{center}
   902 \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}}
   933 \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}}
   903   $\textit{blexer}\;r\,s$ & $\dn$ &
   934   $\textit{blexer}\;r\,s$ & $\dn$ &
   904       $\textit{let}\;r_{der} = (r^\uparrow)\backslash s\;\textit{in}$                
   935       $\textit{let}\;r_{der} = (r^\uparrow)\backslash s\;\textit{in}$\\                
   905   $\;\;\textit{if}\; \textit{bnullable}(r_{der}) \;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,r_{der})\,r
   936   & & $\;\;\;\textit{if}\; \textit{bnullable}(r_{der})$\\
   906        \;\;\textit{else}\;\textit{None}$
   937   & & $\;\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,r_{der})\,r$\\
       
   938   & & $\;\;\;\textit{else}\;\textit{None}$
   907 \end{tabular}
   939 \end{tabular}
   908 \end{center}
   940 \end{center}
   909 
   941 
   910   \noindent
   942   \noindent
   911 This bitcoded lexer first internalises the regular expression $r$ and then
   943 This bitcoded lexer first internalises the regular expression $r$ and then
   956 is the same as if we first erase the bitcoded regular expression and
   988 is the same as if we first erase the bitcoded regular expression and
   957 then perform the ``standard'' derivative operation.
   989 then perform the ``standard'' derivative operation.
   958 
   990 
   959 \begin{lemma}\label{bnullable}%\mbox{}\smallskip\\
   991 \begin{lemma}\label{bnullable}%\mbox{}\smallskip\\
   960 \textit{(1)} $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\    
   992 \textit{(1)} $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\    
   961 \mbox{\hspace{22mm}}\textit{(2)} $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\
   993 \mbox{\hspace{17mm}}\textit{(2)} $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\
   962 \mbox{\hspace{22mm}}\textit{(3)} $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$
   994 \mbox{\hspace{17mm}}\textit{(3)} $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$
   963 %\begin{tabular}{ll}
   995 %\begin{tabular}{ll}
   964 %\textit{(1)} & $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\    
   996 %\textit{(1)} & $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\    
   965 %\textit{(2)} & $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\
   997 %\textit{(2)} & $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\
   966 %\textit{(3)} & $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$.
   998 %\textit{(3)} & $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$.
   967 %\end{tabular}  
   999 %\end{tabular}  
  1016 that $\textit{flex}$ assembles. The lemma establishes that this is the same
  1048 that $\textit{flex}$ assembles. The lemma establishes that this is the same
  1017 value as if we build the annotated derivative $r^\uparrow\backslash s$
  1049 value as if we build the annotated derivative $r^\uparrow\backslash s$
  1018 and then retrieve the corresponding bitcoded version, followed by a
  1050 and then retrieve the corresponding bitcoded version, followed by a
  1019 decoding step.
  1051 decoding step.
  1020 
  1052 
  1021 \begin{lemma}[Main Lemma]\label{mainlemma}\it
  1053 \begin{lemma}[Main Lemma]\label{mainlemma}\mbox{}\\\it
  1022 If $\vdash v : r\backslash s$ then 
  1054 If $\vdash v : r\backslash s$ then 
  1023 $\textit{Some}\,(\textit{flex}\,r\,\textit{id}\,s\,v) =
  1055 $\textit{Some}\,(\textit{flex}\,r\,\textit{id}\,s\,v) =
  1024   \textit{decode}(\textit{retrieve}\,(r^\uparrow \backslash s)\,v)\,r$
  1056   \textit{decode}(\textit{retrieve}\,(r^\uparrow \backslash s)\,v)\,r$
  1025 \end{lemma}  
  1057 \end{lemma}  
  1026 
  1058 
  1079      spawn new copies in later derivative steps and they in turn even more copies. This
  1111      spawn new copies in later derivative steps and they in turn even more copies. This
  1080      destroys any hope of taming the size of the derivatives.  But the
  1112      destroys any hope of taming the size of the derivatives.  But the
  1081      second copy of $r$ in \eqref{derivex} will never contribute to a
  1113      second copy of $r$ in \eqref{derivex} will never contribute to a
  1082      value, because POSIX lexing will always prefer matching a string
  1114      value, because POSIX lexing will always prefer matching a string
  1083      with the first copy. So it could be safely removed without affecting the correctness of the algorithm.
  1115      with the first copy. So it could be safely removed without affecting the correctness of the algorithm.
  1084      The dilemma with the simple-minded
  1116      The issue with the simple-minded
  1085      simplification rules above is that the rule $r + r \Rightarrow r$
  1117      simplification rules above is that the rule $r + r \Rightarrow r$
  1086      will never be applicable because as can be seen in this example the
  1118      will never be applicable because as can be seen in this example the
  1087      regular expressions are not next to each other but separated by another regular expression.
  1119      regular expressions are not next to each other but separated by another regular expression.
  1088 
  1120 
  1089      But here is where Sulzmann and Lu's representation of generalised
  1121      But here is where Sulzmann and Lu's representation of generalised
  1133      acc"}} where @{text eq} is an user-defined equivalence relation that
  1165      acc"}} where @{text eq} is an user-defined equivalence relation that
  1134      compares two elements in @{text rs}. We define this function in
  1166      compares two elements in @{text rs}. We define this function in
  1135      Isabelle/HOL as
  1167      Isabelle/HOL as
  1136 
  1168 
  1137      \begin{center}
  1169      \begin{center}
  1138      \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {\hspace{1mm}}l@ {}}
  1170      \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}}
  1139      @{thm (lhs) distinctWith.simps(1)} & $\dn$ & @{thm (rhs) distinctWith.simps(1)}\\
  1171      @{thm (lhs) distinctWith.simps(1)} & $\dn$ & @{thm (rhs) distinctWith.simps(1)}\\
  1140      @{thm (lhs) distinctWith.simps(2)} & $\dn$ &
  1172      @{thm (lhs) distinctWith.simps(2)} & $\dn$ &
  1141      @{text "if (\<exists> y \<in> acc. eq x y)"} & @{text "then distinctWith xs eq acc"}\\
  1173      @{text "if (\<exists> y \<in> acc. eq x y)"} \\
  1142      & & & @{text "else x :: distinctWith xs eq ({x} \<union> acc)"}
  1174      & & @{text "then distinctWith xs eq acc"}\\
       
  1175      & & @{text "else x :: distinctWith xs eq ({x} \<union> acc)"}
  1143      \end{tabular}
  1176      \end{tabular}
  1144      \end{center}
  1177      \end{center}
  1145 
  1178 
  1146      \noindent where we scan the list from left to right (because we
  1179      \noindent where we scan the list from left to right (because we
  1147      have to remove later copies). In @{text distinctWith}, @{text eq} is intended to be an
  1180      have to remove later copies). In @{text distinctWith}, @{text eq} is intended to be an
  1183      \begin{center}
  1216      \begin{center}
  1184      \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
  1217      \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
  1185      \multicolumn{3}{@ {}c}{@{thm (lhs) flts.simps(1)} $\dn$ @{thm (rhs) flts.simps(1)} \qquad\qquad\qquad\qquad
  1218      \multicolumn{3}{@ {}c}{@{thm (lhs) flts.simps(1)} $\dn$ @{thm (rhs) flts.simps(1)} \qquad\qquad\qquad\qquad
  1186      @{thm (lhs) flts.simps(2)} $\dn$ @{thm (rhs) flts.simps(2)}}\smallskip\\
  1219      @{thm (lhs) flts.simps(2)} $\dn$ @{thm (rhs) flts.simps(2)}}\smallskip\\
  1187      @{thm (lhs) flts.simps(3)[of "bs'" "rs'"]} & $\dn$ & @{thm (rhs) flts.simps(3)[of "bs'" "rs'"]}\\
  1220      @{thm (lhs) flts.simps(3)[of "bs'" "rs'"]} & $\dn$ & @{thm (rhs) flts.simps(3)[of "bs'" "rs'"]}\\
       
  1221      \multicolumn{3}{@ {}c}{@{text "flts (r :: rs)"} $\dn$ @{text "r :: flts rs"}
       
  1222      \hspace{5mm}(otherwise)}
  1188      \end{tabular}
  1223      \end{tabular}
  1189      \end{center}
  1224      \end{center}
  1190 
  1225 
  1191      \noindent
  1226      \noindent
  1192      The second clause of @{text flts} removes all instances of @{text ZERO} in alternatives and
  1227      The second clause of @{text flts} removes all instances of @{text ZERO} in alternatives and
  1196      empty or a singleton list. We take care of those cases in the
  1231      empty or a singleton list. We take care of those cases in the
  1197      @{text "bsimpALTs"} function; similarly we define a helper function that simplifies
  1232      @{text "bsimpALTs"} function; similarly we define a helper function that simplifies
  1198      sequences according to the usual rules about @{text ZERO}s and @{text ONE}s:
  1233      sequences according to the usual rules about @{text ZERO}s and @{text ONE}s:
  1199 
  1234 
  1200      \begin{center}
  1235      \begin{center}
  1201      \begin{tabular}{c@ {\hspace{5mm}}c}
  1236      \begin{tabular}{@ {}c@ {\hspace{4mm}}c@ {}}
  1202      \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
  1237      \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}}
  1203      @{text "bsimpALTs bs []"}  & $\dn$ & @{text "ZERO"}\\
  1238      @{text "bsimpALTs bs []"}  & $\dn$ & @{text "ZERO"}\\
  1204      @{text "bsimpALTs bs [r]"} & $\dn$ & @{text "fuse bs r"}\\
  1239      @{text "bsimpALTs bs [r]"} & $\dn$ & @{text "fuse bs r"}\\
  1205      @{text "bsimpALTs bs rs"}  & $\dn$ & @{text "ALTs bs rs"}\\
  1240      @{text "bsimpALTs bs rs"}  & $\dn$ & @{text "ALTs bs rs"}\\
  1206      \mbox{}\\
  1241      \mbox{}\\
  1207      \end{tabular}
  1242      \end{tabular}
  1208      &
  1243      &
  1209      \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
  1244      \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}}
  1210      @{text "bsimpSEQ bs _ ZERO"}  & $\dn$ & @{text "ZERO"}\\
  1245      @{text "bsimpSEQ bs _ ZERO"}  & $\dn$ & @{text "ZERO"}\\
  1211      @{text "bsimpSEQ bs ZERO _"} & $\dn$ & @{text "ZERO"}\\
  1246      @{text "bsimpSEQ bs ZERO _"} & $\dn$ & @{text "ZERO"}\\
  1212      @{text "bsimpSEQ bs\<^sub>1 (ONE bs\<^sub>2) r\<^sub>2"}
  1247      @{text "bsimpSEQ bs\<^sub>1 (ONE bs\<^sub>2) r\<^sub>2"}
  1213          & $\dn$ & @{text "fuse (bs\<^sub>1 @ bs\<^sub>2) r\<^sub>2"}\\
  1248          & $\dn$ & @{text "fuse (bs\<^sub>1 @ bs\<^sub>2) r\<^sub>2"}\\
  1214      @{text "bsimpSEQ bs r\<^sub>1 r\<^sub>2"} & $\dn$ &  @{text "SEQ bs r\<^sub>1 r\<^sub>2"}
  1249      @{text "bsimpSEQ bs r\<^sub>1 r\<^sub>2"} & $\dn$ &  @{text "SEQ bs r\<^sub>1 r\<^sub>2"}
  1265       and use it in the improved lexing algorithm defined as
  1300       and use it in the improved lexing algorithm defined as
  1266 
  1301 
  1267      \begin{center}
  1302      \begin{center}
  1268 \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}}
  1303 \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}}
  1269   $\textit{blexer}^+\;r\,s$ & $\dn$ &
  1304   $\textit{blexer}^+\;r\,s$ & $\dn$ &
  1270       $\textit{let}\;r_{der} = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$                
  1305       $\textit{let}\;r_{der} = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\                
  1271       $\;\;\textit{if}\; \textit{bnullable}(r_{der}) \;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,r_{der})\,r
  1306       & & $\;\;\textit{if}\; \textit{bnullable}(r_{der})$\\
  1272        \;\;\textit{else}\;\textit{None}$
  1307       & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,r_{der})\,r$\\
       
  1308       & & $\;\;\textit{else}\;\textit{None}$
  1273 \end{tabular}
  1309 \end{tabular}
  1274 \end{center}
  1310 \end{center}
  1275 
  1311 
  1276        \noindent The remaining task is to show that @{term blexer} and
  1312        \noindent The remaining task is to show that @{term blexer} and
  1277        @{term "blexer_simp"} generate the same answers.
  1313        @{term "blexer_simp"} generate the same answers.
  1419   expressions to be removed provided a regular expression earlier in the list can
  1455   expressions to be removed provided a regular expression earlier in the list can
  1420   match the same strings.}\label{SimpRewrites}
  1456   match the same strings.}\label{SimpRewrites}
  1421   \end{figure}
  1457   \end{figure}
  1422 *}
  1458 *}
  1423 
  1459 
  1424 section {* Finiteness of Derivatives *}
  1460 section {* Finite Bound for the Size of Derivatives *}
  1425 
  1461 
  1426 text {*
  1462 text {*
  1427 
  1463 
  1428 In this section let us sketch our argument for why the size of the simplified
  1464 In this section let us sketch our argument for why the size of the simplified
  1429 derivatives with the aggressive simplification function can be finitely bounded. Suppose
  1465 derivatives with the aggressive simplification function can be finitely bounded. Suppose
  1441 Note that the bound $N$ is a bound for \emph{all} strings, no matter how long they are.
  1477 Note that the bound $N$ is a bound for \emph{all} strings, no matter how long they are.
  1442 We establish this bound by induction on $r$. The base cases for @{term AZERO},
  1478 We establish this bound by induction on $r$. The base cases for @{term AZERO},
  1443 @{term "AONE bs"} and @{term "ACHAR bs c"} are straightforward. The interesting case is
  1479 @{term "AONE bs"} and @{term "ACHAR bs c"} are straightforward. The interesting case is
  1444 for sequences of the form @{term "ASEQ bs r\<^sub>1 r\<^sub>2"}. In this case our induction
  1480 for sequences of the form @{term "ASEQ bs r\<^sub>1 r\<^sub>2"}. In this case our induction
  1445 hypotheses state $\exists N_1. \forall s. \; \llbracket{}@{term "bders_simp r\<^sub>1 s"}\rrbracket \leq N_1$ and
  1481 hypotheses state $\exists N_1. \forall s. \; \llbracket{}@{term "bders_simp r\<^sub>1 s"}\rrbracket \leq N_1$ and
  1446 $\exists N_2. \forall s. \; \llbracket@{term "bders_simp r\<^sub>2 s"}\rrbracket \leq N_2$. We can reason as follows
  1482 $\exists N_2. \forall s. \; \llbracket@{term "bders_simp r\<^sub>2 s"}\rrbracket \leq N_2$. We can reason as follows\medskip
  1447 %
  1483 %
  1448 \begin{center}
  1484 %!\begin{center}
       
  1485 
       
  1486 \noindent\begin{minipage}{\textwidth}
  1449 \begin{tabular}{lcll}
  1487 \begin{tabular}{lcll}
  1450 & & $ \llbracket@{term "bders_simp (ASEQ bs r\<^sub>1 r\<^sub>2) s"}\rrbracket$\\
  1488 & & $ \llbracket@{term "bders_simp (ASEQ bs r\<^sub>1 r\<^sub>2) s"}\rrbracket$\\
  1451 & $ = $ & $\llbracket bsimp\,(\textit{ALTs}\;bs\;((@{term "ASEQ [] (bders_simp r\<^sub>1 s) r\<^sub>2"}) ::
  1489 & $ = $ & $\llbracket bsimp\,(\textit{ALTs}\;bs\;((@{term "ASEQ [] (bders_simp r\<^sub>1 s) r\<^sub>2"}) ::
  1452     [@{term "bders_simp r\<^sub>2 s'"} \;|\; s' \in \textit{Suffix}(@{text r}_1, s)]))\rrbracket $ & (1) \\
  1490     [@{term "bders_simp r\<^sub>2 s'"} \;|\; s' \in \textit{Suf}(@{text r}_1, s)]))\rrbracket $ & (1) \\
  1453 & $\leq$ &
  1491 & $\leq$ &
  1454     $\llbracket\textit{distinctWith}\,((@{term "ASEQ [] (bders_simp r\<^sub>1 s) r\<^sub>2"}) ::
  1492     $\llbracket\textit{distinctWith}\,((@{term "ASEQ [] (bders_simp r\<^sub>1 s) r\<^sub>2"}) ::
  1455     [@{term "bders_simp r\<^sub>2 s'"} \;|\; s' \in \textit{Suffix}(@{text r}_1, s)])\,\approx\;@{term "{}"}\rrbracket + 1 $ & (2) \\
  1493     [@{term "bders_simp r\<^sub>2 s'"} \;|\; s' \in \textit{Suf}(@{text r}_1, s)])\,\approx\;@{term "{}"}\rrbracket + 1 $ & (2) \\
  1456 & $\leq$ & $\llbracket@{term "ASEQ [] (bders_simp r\<^sub>1 s) r\<^sub>2"}\rrbracket +
  1494 & $\leq$ & $\llbracket@{term "ASEQ [] (bders_simp r\<^sub>1 s) r\<^sub>2"}\rrbracket +
  1457              \llbracket\textit{distinctWith}\,[@{term "bders_simp r\<^sub>2 s'"} \;|\; s' \in \textit{Suffix}(@{text r}_1, s)]\,\approx\;@{term "{}"}\rrbracket + 1 $ & (3) \\
  1495              \llbracket\textit{distinctWith}\,[@{term "bders_simp r\<^sub>2 s'"} \;|\; s' \in \textit{Suf}(@{text r}_1, s)]\,\approx\;@{term "{}"}\rrbracket + 1 $ & (3) \\
  1458 & $\leq$ & $N_1 + \llbracket r_2\rrbracket + 2 +
  1496 & $\leq$ & $N_1 + \llbracket r_2\rrbracket + 2 +
  1459       \llbracket\textit{distinctWith}\,[@{term "bders_simp r\<^sub>2 s'"} \;|\; s' \in \textit{Suffix}(@{text r}_1, s)]\,\approx\;@{term "{}"}\rrbracket$ & (4)\\
  1497       \llbracket\textit{distinctWith}\,[@{term "bders_simp r\<^sub>2 s'"} \;|\; s' \in \textit{Suf}(@{text r}_1, s)]\,\approx\;@{term "{}"}\rrbracket$ & (4)\\
  1460 & $\leq$ & $N_1 + \llbracket r_2\rrbracket + 2 + l_{N_{2}} * N_{2}$ & (5)
  1498 & $\leq$ & $N_1 + \llbracket r_2\rrbracket + 2 + l_{N_{2}} * N_{2}$ & (5)
  1461 \end{tabular}
  1499 \end{tabular}
  1462 \end{center}
  1500 \end{minipage}\medskip
       
  1501 %!\end{center}
  1463 
  1502 
  1464 % tell Chengsong about Indian paper of closed forms of derivatives
  1503 % tell Chengsong about Indian paper of closed forms of derivatives
  1465 
  1504 
  1466 \noindent
  1505 \noindent
  1467 where in (1) the $\textit{Suffix}(@{text "r"}_1, s)$ are all the suffixes of $s$ where @{term "bders_simp r\<^sub>1 s'"} is nullable ($s'$ being a suffix of $s$).
  1506 where in (1) the $\textit{Suf}(@{text "r"}_1, s)$ are all the suffixes of $s$ where @{term "bders_simp r\<^sub>1 s'"} is nullable ($s'$ being a suffix of $s$).
  1468 In (3) we know that  $\llbracket@{term "ASEQ [] (bders_simp r\<^sub>1 s) r\<^sub>2"}\rrbracket$ is 
  1507 In (3) we know that  $\llbracket@{term "ASEQ [] (bders_simp r\<^sub>1 s) r\<^sub>2"}\rrbracket$ is 
  1469 bounded by $N_1 + \llbracket{}r_2\rrbracket + 1$. In (5) we know the list comprehension contains only regular expressions of size smaller
  1508 bounded by $N_1 + \llbracket{}r_2\rrbracket + 1$. In (5) we know the list comprehension contains only regular expressions of size smaller
  1470 than $N_2$. The list length after @{text distinctWith} is bounded by a number, which we call $l_{N_2}$. It stands
  1509 than $N_2$. The list length after @{text distinctWith} is bounded by a number, which we call $l_{N_2}$. It stands
  1471 for the number of distinct regular expressions smaller than $N_2$ (there can only be finitely many of them).
  1510 for the number of distinct regular expressions smaller than $N_2$ (there can only be finitely many of them).
  1472 We reason similarly for @{text STAR}.\smallskip
  1511 We reason similarly for @{text STAR}.\smallskip
  1500 \]
  1539 \]
  1501 
  1540 
  1502 \noindent
  1541 \noindent
  1503 where the @{text "Left"}-alternatives get priority. However, this violates
  1542 where the @{text "Left"}-alternatives get priority. However, this violates
  1504 the POSIX rules and we have not been able to
  1543 the POSIX rules and we have not been able to
  1505 reconcile this problem. Therefore we leave better bounds for future work.\\[-6.5mm]
  1544 reconcile this problem. Therefore we leave better bounds for future work.%!\\[-6.5mm]
       
  1545 
       
  1546 Note that while Antimirov was able to give a bound on the \emph{size}
       
  1547 of his partial derivatives~\cite{Antimirov95}, Brzozowski gave a bound
       
  1548 on the \emph{number} of derivatives, provided they are quotient via
       
  1549 ACI rules \cite{Brzozowski1964}. Brozozowski's result is crucial when one
       
  1550 uses derivatives for obtaining an automaton (it essentially bounds
       
  1551 the number of states). However, this result does \emph{not}
       
  1552 transfer to our setting where we are interested in the \emph{size} of the
       
  1553 derivatives. For example it is not true for our derivatives that the
       
  1554 set of of derivatives $r \backslash s$ for a given $r$ and all strings
       
  1555 $s$ is finite. This is because for our annotated regular expressions
       
  1556 the bitcode annotation is dependent on the number of iterations that
       
  1557 are needed for STAR-regular expressions. Since we want to do lexing
       
  1558 by calculating (as fast as possible) derivatives, the bound on the size
       
  1559 of the derivatives is important, not the number of derivatives.
       
  1560 
  1506 
  1561 
  1507 *}
  1562 *}
  1508 
  1563 
  1509 
  1564 
  1510 section {* Conclusion *}
  1565 section {* Conclusion *}
  1636    %\\[-10mm]
  1691    %\\[-10mm]
  1637 
  1692 
  1638 
  1693 
  1639 %%\bibliographystyle{plain}
  1694 %%\bibliographystyle{plain}
  1640 \bibliography{root}
  1695 \bibliography{root}
  1641 
  1696 *}
       
  1697 
       
  1698 (*<*)
       
  1699 text {*
  1642 \newpage
  1700 \newpage
  1643 \appendix
  1701 \appendix
  1644 
  1702 
  1645 \section{Some Proofs}
  1703 \section{Some Proofs}
  1646 
  1704 
  1737      By straightforward mutual induction on the definition of $\leadsto$ and $\stackrel{s}{\leadsto}$.
  1795      By straightforward mutual induction on the definition of $\leadsto$ and $\stackrel{s}{\leadsto}$.
  1738      The case for $LD$ holds because @{term "L (erase (bder c r\<^sub>2)) \<subseteq> L (erase (bder c r\<^sub>1))"}
  1796      The case for $LD$ holds because @{term "L (erase (bder c r\<^sub>2)) \<subseteq> L (erase (bder c r\<^sub>1))"}
  1739      if and only if @{term "L (erase (r\<^sub>2)) \<subseteq> L (erase (r\<^sub>1))"}.
  1797      if and only if @{term "L (erase (r\<^sub>2)) \<subseteq> L (erase (r\<^sub>1))"}.
  1740      \end{proof}
  1798      \end{proof}
  1741 *}
  1799 *}
  1742 
  1800 (*>*)
  1743 (*<*)
  1801 (*<*)
  1744 end
  1802 end
  1745 (*>*)
  1803 (*>*)