ChengsongTanPhdThesis/Chapters/Bitcoded1.tex
changeset 575 3178f0e948ac
parent 564 3cbcd7cda0a9
child 576 3e1b699696b6
equal deleted inserted replaced
574:692911c0b981 575:3178f0e948ac
   176 relatively small size of bits can be easily manipulated and ``moved
   176 relatively small size of bits can be easily manipulated and ``moved
   177 around" in a regular expression. 
   177 around" in a regular expression. 
   178 
   178 
   179 Because of the lossiness, the process of decoding a bitlist requires additionally 
   179 Because of the lossiness, the process of decoding a bitlist requires additionally 
   180 a regular expression. The function $\decode$ is defined as:
   180 a regular expression. The function $\decode$ is defined as:
   181 We define the reverse operation of $\code$, which is $\decode$.
       
   182 As expected, $\decode$ not only requires the bit-codes,
       
   183 but also a regular expression to guide the decoding and 
       
   184 fill the gaps of characters:
       
   185 
       
   186 
   181 
   187 %\begin{definition}[Bitdecoding of Values]\mbox{}
   182 %\begin{definition}[Bitdecoding of Values]\mbox{}
   188 \begin{center}
   183 \begin{center}
   189 \begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
   184 \begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
   190   $\textit{decode}'\,bs\,(\ONE)$ & $\dn$ & $(\Empty, bs)$\\
   185   $\textit{decode}'\,bs\,(\ONE)$ & $\dn$ & $(\Empty, bs)$\\
   212 \end{tabular}    
   207 \end{tabular}    
   213 \end{center} 
   208 \end{center} 
   214 %\end{definition}
   209 %\end{definition}
   215 
   210 
   216 \noindent
   211 \noindent
   217 The function $\decode'$ returns a pair consisting of a partially decoded value and some leftover:
   212 The function $\decode'$ returns a pair consisting of 
   218 $\decode'$ does most of the job while $\decode$ throws
   213 a partially decoded value and some leftover bit list that cannot
   219 away leftover bit-codes and returns the value only.
   214 be decide yet.
       
   215 The function $\decode'$ succeeds if the left-over 
       
   216 bit-sequence is empty.
   220 $\decode$ is terminating as $\decode'$ is terminating.
   217 $\decode$ is terminating as $\decode'$ is terminating.
   221 We have the property that $\decode$ and $\code$ are
   218 $\decode'$ is terminating 
       
   219 because at least one of $\decode'$'s parameters will go down in terms
       
   220 of size.
       
   221 Assuming we have a value $v$ and regular expression $r$
       
   222 with $\vdash v:r$,
       
   223 then we have the property that $\decode$ and $\code$ are
   222 reverse operations of one another:
   224 reverse operations of one another:
   223 \begin{lemma}
   225 \begin{lemma}
   224 \[\vdash v : r \implies \decode \; (\code \; v) \; r = \textit{Some}(v) \]
   226 \[If \vdash v : r \; then \;\decode \; (\code \; v) \; r = \textit{Some}(v) \]
   225 \end{lemma}
   227 \end{lemma}
   226 \begin{proof}
   228 \begin{proof}
   227 By proving a more general version of the lemma, on $\decode'$:
   229 By proving a more general version of the lemma, on $\decode'$:
   228 \[\vdash v : r \implies \decode' \; ((\code \; v) @ ds) \; r = (v, ds) \]
   230 \[\vdash v : r \implies \decode' \; ((\code \; v) @ ds) \; r = (v, ds) \]
   229 Then setting $ds$ to be $[]$ and unfolding $\decode$ definition
   231 Then setting $ds$ to be $[]$ and unfolding $\decode$ definition,
   230 we get the lemma.
   232 we obtain the property.
   231 \end{proof}
   233 \end{proof}
   232 With the $\code$ and $\decode$ functions in hand, we know how to 
   234 With the $\code$ and $\decode$ functions in hand, we know how to 
   233 switch between bit-codes and value--the two different representations of 
   235 switch between bit-codes and values. 
   234 lexing information. 
   236 The next step is to integrate this information into regular expression.
   235 The next step is to integrate this information into the working regular expression.
       
   236 Attaching bits to the front of regular expressions is the solution Sulzamann and Lu
   237 Attaching bits to the front of regular expressions is the solution Sulzamann and Lu
   237 gave for storing partial values on the fly:
   238 gave for storing partial values in regular expressions. 
       
   239 Annotated regular expressions are therefore defined as the Isabelle
       
   240 datatype:
   238 
   241 
   239 \begin{center}
   242 \begin{center}
   240 \begin{tabular}{lcl}
   243 \begin{tabular}{lcl}
   241   $\textit{a}$ & $::=$  & $\ZERO$\\
   244   $\textit{a}$ & $::=$  & $\ZERO$\\
   242                   & $\mid$ & $_{bs}\ONE$\\
   245                   & $\mid$ & $_{bs}\ONE$\\
   247 \end{tabular}    
   250 \end{tabular}    
   248 \end{center}  
   251 \end{center}  
   249 %(in \textit{ALTS})
   252 %(in \textit{ALTS})
   250 
   253 
   251 \noindent
   254 \noindent
   252 We call these regular expressions carrying bit-codes \emph{Annotated regular expressions}.
   255 where $bs$ stands for bit-codes, $a$  for $\mathbf{a}$nnotated regular
   253 $bs$ stands for bit-codes, $a$  for $\mathbf{a}$nnotated regular
   256 expressions and $as$ for lists of annotated regular expressions.
   254 expressions and $as$ for a list of annotated regular expressions.
   257 The alternative constructor, written, $\sum$, has been generalised to 
   255 The alternative constructor ($\sum$) has been generalised to 
   258 accept a list of annotated regular expressions rather than just two.
   256 accept a list of annotated regular expressions rather than just 2.
   259 Why is it generalised? This is because when we open up nested 
   257 
   260 alternatives, there could be more than two elements at the same level
   258 
   261 after de-duplication, which can no longer be stored in a binary
   259 The first thing we define related to bit-coded regular expressions
   262 constructor.
   260 is how we move bits, for instance pasting it at the front of an annotated regular expression.
   263 
   261 The operation $\fuse$ is just to attach bit-codes 
   264 The first operation we define related to bit-coded regular expressions
       
   265 is how we move bits to the inside of regular expressions.
       
   266 Called $\fuse$, this operation is attaches bit-codes 
   262 to the front of an annotated regular expression:
   267 to the front of an annotated regular expression:
   263 \begin{center}
   268 \begin{center}
   264 \begin{tabular}{lcl}
   269 \begin{tabular}{lcl}
   265   $\textit{fuse}\;bs \; \ZERO$ & $\dn$ & $\ZERO$\\
   270   $\textit{fuse}\;bs \; \ZERO$ & $\dn$ & $\ZERO$\\
   266   $\textit{fuse}\;bs\; _{bs'}\ONE$ & $\dn$ &
   271   $\textit{fuse}\;bs\; _{bs'}\ONE$ & $\dn$ &
   275      $_{bs @ bs'}a^*$
   280      $_{bs @ bs'}a^*$
   276 \end{tabular}    
   281 \end{tabular}    
   277 \end{center} 
   282 \end{center} 
   278 
   283 
   279 \noindent
   284 \noindent
   280 With that we are able to define $\internalise$.
   285 With \emph{fuse} we are able to define the $\internalise$ function
   281 
   286 that translates a ``standard'' regular expression into an
   282 To do lexing using annotated regular expressions, we shall first
   287 annotated regular expression.
   283 transform the usual (un-annotated) regular expressions into annotated
   288 This function will be applied before we start
   284 regular expressions. This operation is called \emph{internalisation} and
   289 with the derivative phase of the algorithm.
   285 defined as follows:
   290 
   286 
       
   287 %\begin{definition}
       
   288 \begin{center}
   291 \begin{center}
   289 \begin{tabular}{lcl}
   292 \begin{tabular}{lcl}
   290   $(\ZERO)^\uparrow$ & $\dn$ & $\ZERO$\\
   293   $(\ZERO)^\uparrow$ & $\dn$ & $\ZERO$\\
   291   $(\ONE)^\uparrow$ & $\dn$ & $_{[]}\ONE$\\
   294   $(\ONE)^\uparrow$ & $\dn$ & $_{[]}\ONE$\\
   292   $(c)^\uparrow$ & $\dn$ & $_{[]}{\bf c}$\\
   295   $(c)^\uparrow$ & $\dn$ & $_{[]}{\bf c}$\\
   301 \end{center}    
   304 \end{center}    
   302 %\end{definition}
   305 %\end{definition}
   303 
   306 
   304 \noindent
   307 \noindent
   305 We use an up arrow with postfix notation
   308 We use an up arrow with postfix notation
   306 to denote the operation, 
   309 to denote this operation.
   307 for convenience. The $\textit{internalise} \; r$
   310 for convenience. The $\textit{internalise} \; r$
   308 notation is more cumbersome.
   311 notation is more cumbersome.
   309 The opposite of $\textit{internalise}$ is
   312 The opposite of $\textit{internalise}$ is
   310 $\erase$, where all the bit-codes are removed,
   313 $\erase$, where all the bit-codes are removed,
   311 and the alternative operator $\sum$ for annotated
   314 and the alternative operator $\sum$ for annotated
   312 regular expressions is transformed to the binary alternatives
   315 regular expressions is transformed to the binary alternatives
   313 for plain regular expressions.
   316 for plain regular expressions.
   314 \begin{center}
   317 \begin{center}
   315 	\begin{tabular}{lcr}
   318 	\begin{tabular}{lcl}
   316 		$\erase \; \ZERO$ & $\dn$ & $\ZERO$\\
   319 		$\ZERO_\downarrow$ & $\dn$ & $\ZERO$\\
   317 		$\erase \; _{bs}\ONE$ & $\dn$ & $\ONE$\\
   320 		$( _{bs}\ONE )_\downarrow$ & $\dn$ & $\ONE$\\
   318 		$\erase \; _{bs}\mathbf{c}$ & $\dn$ & $\mathbf{c}$\\
   321 		$( _{bs}\mathbf{c} )_\downarrow$ & $\dn$ & $\mathbf{c}$\\
   319 		$\erase \; _{bs} a_1 \cdot a_2$ & $\dn$ & $\erase \; r_1\cdot\erase\; r_2$\\
   322 		$( _{bs} a_1 \cdot a_2 )_\downarrow$ & $\dn$ & 
   320 		$\erase \; _{bs} [] $ & $\dn$ & $\ZERO$\\
   323 		$ (a_1) _\downarrow \cdot  (a_2) _\downarrow$\\
   321 		$\erase \; _{bs} [a] $ & $\dn$ & $\erase \; a$\\
   324 		$( _{bs} [])_\downarrow $ & $\dn$ & $\ZERO $\\
   322 		$\erase \; _{bs} \sum [a_1, \; a_2]$ & $\dn$ & $\erase \; a_1 +\erase \; a_2$\\
   325 		$( _{bs} [a]  )_\downarrow$ & $\dn$ & $a_\downarrow$\\
   323 		$\erase \; _{bs} \sum (a :: as)$ & $\dn$ & $\erase \; a + \erase \; _{[]} \sum as$\\
   326 		$_{bs} \sum [a_1, \; a_2]$ & $\dn$ & $ (a_1) _\downarrow + ( a_2 ) _\downarrow $\\
   324 		$\erase \; _{bs} a^*$ & $\dn$ & $(\erase \; a)^*$
   327 		$(_{bs} \sum (a :: as))_\downarrow$ & $\dn$ & $ a_\downarrow + \; (_{[]} \sum as)_\downarrow$\\
       
   328 		$( _{bs} a^* )_\downarrow$ & $\dn$ & $(a_\downarrow)^*$
   325 	\end{tabular}
   329 	\end{tabular}
   326 \end{center}
   330 \end{center}
   327 \noindent
   331 \noindent
   328 We also abbreviate the $\erase\; a$ operation
   332 We also abbreviate the $\erase\; a$ operation
   329 as $a_\downarrow$, for conciseness.
   333 as $a_\downarrow$, for conciseness.
   330 For bit-coded regular expressions, as a different datatype, 
   334 For bit-coded regular expressions, as a different datatype, 
   331 testing whether they contain empty string in their lauguage requires
   335 testing whether they contain empty string in their lauguage requires
   332 a dedicated function $\bnullable$
   336 a dedicated function $\bnullable$
   333 which simply calls $\erase$ first before testing whether it is $\nullable$.
   337 which simply calls $\erase$ first before testing whether it is $\nullable$.
   334 \begin{center}
   338 \begin{definition}
   335 	\begin{tabular}{lcr}
   339 		$\bnullable \; a \dn  \nullable \; (a_\downarrow)$
   336 		$\bnullable \; a $ & $\dn$ & $\nullable \; (a_\downarrow)$
   340 \end{definition}
   337 	\end{tabular}
       
   338 \end{center}
       
   339 The function for collecting the
   341 The function for collecting the
   340 bitcodes of a $\bnullable$ annotated regular expression
   342 bitcodes at the end of the derivative 
       
   343 phase from a (b)nullable regular expression
   341 is a generalised version of the $\textit{mkeps}$ function
   344 is a generalised version of the $\textit{mkeps}$ function
   342 for annotated regular expressions, called $\textit{bmkeps}$:
   345 for annotated regular expressions, called $\textit{bmkeps}$:
   343 
   346 
   344 
   347 
   345 %\begin{definition}[\textit{bmkeps}]\mbox{}
   348 %\begin{definition}[\textit{bmkeps}]\mbox{}
   357 \end{tabular}    
   360 \end{tabular}    
   358 \end{center}    
   361 \end{center}    
   359 %\end{definition}
   362 %\end{definition}
   360 
   363 
   361 \noindent
   364 \noindent
   362 This function completes the value information by travelling along the
   365 $\bmkeps$ completes the value information by travelling along the
   363 path of the regular expression that corresponds to a POSIX value and
   366 path of the regular expression that corresponds to a POSIX value and
   364 collecting all the bitcodes, and using $S$ to indicate the end of star
   367 collecting all the bitcodes, and attaching $S$ to indicate the end of star
   365 iterations. 
   368 iterations. 
   366 
   369 
   367 The most central question is how these partial lexing information
   370 Now we give out the central part of this lexing algorithm,
   368 represented as bit-codes is augmented and carried around 
   371 the $\bder$ function (stands for \emph{b}itcoded-derivative).
   369 during a derivative is taken.
   372 For most time we use the infix notation $(\_\backslash\_)$ 
   370 This is done by adding bitcodes to the 
   373 to mean $\bder$ for brevity when
   371 derivatives, for example when one more star iteratoin is taken (we
   374 there is no danger of confusion with derivatives on plain regular expressions.
   372 call the operation of derivatives on annotated regular expressions $\bder$
   375 For example, we write $( _{[]}r^* ) \backslash c$ instead of $\bder \;c \; _{[]}r^*$,
   373 because it is derivatives on regular expressiones with \emph{b}itcodes),
   376 as the bitcodes at the front of $r^*$ indicates that it is 
   374 we need to unfold it into a sequence,
   377 a bit-coded regular expression, not a plain one.
   375 and attach an additional bit $Z$ to the front of $r \backslash c$
   378 $\bder$ tells us how regular expressions can be recursively traversed,
   376 to indicate one more star iteration. 
   379 where the bitcodes are augmented and carried around 
   377 \begin{center}
   380 when a derivative is taken.
   378   \begin{tabular}{@{}lcl@{}}
       
   379   $\bder \; c\; (_{bs}a^*) $ & $\dn$ &
       
   380       $_{bs}(\textit{fuse}\, [Z] \; \bder \; c \; a)\cdot
       
   381        (_{[]}a^*))$
       
   382 \end{tabular}    
       
   383 \end{center}    
       
   384 
       
   385 \noindent
       
   386 For most time we use the infix notation $\backslash$ to mean $\bder$ for brevity when
       
   387 there is no danger of confusion with derivatives on plain regular expressions, 
       
   388 for example, the above can be expressed as
       
   389 \begin{center}
       
   390   \begin{tabular}{@{}lcl@{}}
       
   391   $(_{bs}a^*)\,\backslash c$ & $\dn$ &
       
   392       $_{bs}(\textit{fuse}\, [Z] \; a\,\backslash c)\cdot
       
   393        (_{[]}a^*))$
       
   394 \end{tabular}    
       
   395 \end{center}   
       
   396 
       
   397 \noindent
       
   398 Using the picture we used earlier to depict this, the transformation when 
       
   399 taking a derivative w.r.t a star is like below:
       
   400 
       
   401 \begin{tabular}{@{}l@{\hspace{1mm}}l@{\hspace{0mm}}c@{}}
       
   402 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
       
   403     \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
       
   404         {$bs$
       
   405          \nodepart{two} $a^*$ };
       
   406 %\caption{term 1 \ref{term:1}'s matching configuration}
       
   407 \end{tikzpicture} 
       
   408 &
       
   409 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
       
   410     \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
       
   411         {$v_{\text{previous iterations}}$
       
   412          \nodepart{two} $a^*$};
       
   413 %\caption{term 1 \ref{term:1}'s matching configuration}
       
   414 \end{tikzpicture}
       
   415 \\
       
   416 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
       
   417     \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
       
   418         { $bs$ + [Z]
       
   419          \nodepart{two}  $(a\backslash c )\cdot a^*$ };
       
   420 %\caption{term 1 \ref{term:1}'s matching configuration}
       
   421 \end{tikzpicture}
       
   422 &
       
   423 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
       
   424     \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
       
   425         {$v_{\text{previous iterations}}$ + 1 more iteration
       
   426          \nodepart{two} $(a\backslash c )\cdot a^*$ };
       
   427 %\caption{term 1 \ref{term:1}'s matching configuration}
       
   428 \end{tikzpicture}
       
   429 \end{tabular}    
       
   430 \noindent
       
   431 Another place in the $\bder$ function where it differs
       
   432 from normal derivatives (on un-annotated regular expressions)
       
   433 is the sequence case:
       
   434 \begin{center}
       
   435   \begin{tabular}{@{}lcl@{}}
       
   436 
       
   437   $(_{bs}\;a_1\cdot a_2)\,\backslash c$ & $\dn$ &
       
   438      $\textit{if}\;\textit{bnullable}\,a_1$\\
       
   439 					       & &$\textit{then}\;_{bs}\sum\,[(_{[]}\,(a_1\,\backslash c)\cdot\,a_2),$\\
       
   440 					       & &$\phantom{\textit{then},\;_{bs}\sum\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\
       
   441   & &$\textit{else}\;_{bs}\,(a_1\,\backslash c)\cdot a_2$
       
   442 \end{tabular}    
       
   443 \end{center}    
       
   444 
       
   445 The difference is that (when $a_1$ is $\bnullable$)
       
   446 we use $\bmkeps$ to store the lexing information
       
   447 in $a_1$ before collapsing it (as it has been fully matched by string prior to $c$, 
       
   448 and attach the collected bit-codes to the front of $a_2$
       
   449 before throwing away $a_1$. We assume that $\bmkeps$ correctly extracts the bitcode for how $a_1$
       
   450 matches the string prior to $c$ (more on this later).
       
   451 The bitsequence $\textit{bs}$ which was initially attached to the first element of the sequence
       
   452 $a_1 \cdot a_2$, has now been elevated to the top level of teh $\sum$. 
       
   453 This is because this piece of information will be needed whichever way the sequence is matched,
       
   454 regardless of whether $c$ belongs to $a_1$ or $a_2$.
       
   455 
       
   456 In the injection-based lexing, $r_1$ is immediately thrown away in 
       
   457 subsequent derivatives on the right branch (when $r_1$ is $\nullable$),
       
   458 \begin{center}
       
   459 	$(r_1 \cdot r_2 )\backslash c = (r_1 \backslash c) \cdot r_2 + r_2 \backslash c$
       
   460 \end{center}
       
   461 \noindent
       
   462 as it knows $r_1$ is stored on stack and available once the recursive 
       
   463 call to later derivatives finish.
       
   464 Therefore, if the $\Right$ branch is taken in a $\POSIX$ match,
       
   465 we construct back the sequence value once step back by
       
   466 calling a on $\mkeps(r_1)$.
       
   467 \begin{center}
       
   468 	\begin{tabular}{lcr}
       
   469 		$\ldots r_1 \cdot r_2$ & $\rightarrow$ & $r_1\cdot r_2 + r_2 \backslash c \ldots $\\
       
   470 		$\ldots \Seq(v_1, v_2) (\Seq(\mkeps(r1), (\inj \; r_2 \; c\; v_{2c})))$ & $\leftarrow$ & $\Right(v_{2c})\ldots$
       
   471 	\end{tabular}
       
   472 \end{center}
       
   473 \noindent
       
   474 The rest of the clauses of $\bder$ is rather similar to
       
   475 $\der$, and is put together here as a wholesome definition
       
   476 for $\bder$:
       
   477 \begin{center}
   381 \begin{center}
   478   \begin{tabular}{@{}lcl@{}}
   382   \begin{tabular}{@{}lcl@{}}
   479   $(\ZERO)\,\backslash c$ & $\dn$ & $\ZERO$\\  
   383   $(\ZERO)\,\backslash c$ & $\dn$ & $\ZERO$\\  
   480   $(_{bs}\ONE)\,\backslash c$ & $\dn$ & $\ZERO$\\  
   384   $(_{bs}\ONE)\,\backslash c$ & $\dn$ & $\ZERO$\\  
   481   $(_{bs}{\bf d})\,\backslash c$ & $\dn$ &
   385   $(_{bs}{\bf d})\,\backslash c$ & $\dn$ &
   491   $(_{bs}a^*)\,\backslash c$ & $\dn$ &
   395   $(_{bs}a^*)\,\backslash c$ & $\dn$ &
   492       $_{bs}(\textit{fuse}\, [Z] \; r\,\backslash c)\cdot
   396       $_{bs}(\textit{fuse}\, [Z] \; r\,\backslash c)\cdot
   493        (_{[]}r^*))$
   397        (_{[]}r^*))$
   494 \end{tabular}    
   398 \end{tabular}    
   495 \end{center}    
   399 \end{center}    
       
   400 \noindent
       
   401 We give the intuition behind some of the more involved cases in 
       
   402 $\bder$. For example,
       
   403 in the \emph{star} case,
       
   404 a derivative on $_{bs}a^*$ means 
       
   405 that one more star iteratoin needs to be taken.
       
   406 we need to unfold it into a sequence,
       
   407 and attach an additional bit $Z$ to the front of $r \backslash c$
       
   408 as a record to indicate one new star iteration is unfolded.
       
   409 
       
   410 \noindent
       
   411 \begin{center}
       
   412   \begin{tabular}{@{}lcl@{}}
       
   413   $(_{bs}a^*)\,\backslash c$ & $\dn$ &
       
   414   $_{bs}(\underbrace{\textit{fuse}\, [Z] \; a\,\backslash c}_{\text{One more iteration}})\cdot
       
   415        (_{[]}a^*))$
       
   416 \end{tabular}    
       
   417 \end{center}   
       
   418 
       
   419 \noindent
       
   420 This information will be recovered later by the $\decode$ function.
       
   421 The intuition is that the bit $Z$ will be decoded at the right location,
       
   422 because we accumulate bits from left to right (a rigorous proof will be given
       
   423 later).
       
   424 
       
   425 \begin{tikzpicture}[ > = stealth, % arrow head style
       
   426         shorten > = 1pt, % don't touch arrow head to node
       
   427         semithick % line style
       
   428     ]
       
   429 
       
   430     \tikzstyle{every state}=[
       
   431         draw = black,
       
   432         thin,
       
   433         fill = cyan!29,
       
   434         minimum size = 7mm
       
   435     ]
       
   436     \begin{scope}[node distance=1cm and 0cm, every node/.style=state]
       
   437 		\node (k) [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
       
   438         {$bs$
       
   439          \nodepart{two} $a^*$ };
       
   440 	 \node (l) [below =of k, rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
       
   441         { $bs$ + [Z]
       
   442          \nodepart{two}  $(a\backslash c )\cdot a^*$ };
       
   443     \end{scope}
       
   444     \path[->] 
       
   445 	      (k) edge (l);
       
   446 \end{tikzpicture}
       
   447 
       
   448 Pictorially the process looks like below.
       
   449 Like before, the red region denotes
       
   450 previous lexing information (stored as bitcodes in $bs$).
       
   451 
       
   452 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
       
   453 	\begin{scope}[node distance=1cm]   
       
   454 		\node (a) [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
       
   455         {$bs$
       
   456          \nodepart{two} $a^*$ };
       
   457 	 \node (b) [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
       
   458         { $bs$ + [Z]
       
   459          \nodepart{two}  $(a\backslash c )\cdot a^*$ };
       
   460 %\caption{term 1 \ref{term:1}'s matching configuration}
       
   461  	\end{scope}
       
   462 \end{tikzpicture}
       
   463 
       
   464 \noindent
       
   465 Another place in the $\bder$ function where it differs
       
   466 from normal derivatives (on un-annotated regular expressions)
       
   467 is the sequence case:
       
   468 \begin{center}
       
   469   \begin{tabular}{@{}lcl@{}}
       
   470 
       
   471   $(_{bs}\;a_1\cdot a_2)\,\backslash c$ & $\dn$ &
       
   472      $\textit{if}\;\textit{bnullable}\,a_1$\\
       
   473 					       & &$\textit{then}\;_{bs}\sum\,[(_{[]}\,(a_1\,\backslash c)\cdot\,a_2),$\\
       
   474 					       & &$\phantom{\textit{then},\;_{bs}\sum\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\
       
   475   & &$\textit{else}\;_{bs}\,(a_1\,\backslash c)\cdot a_2$
       
   476 \end{tabular}    
       
   477 \end{center}    
       
   478 
       
   479 The difference is that (when $a_1$ is $\bnullable$)
       
   480 we use $\bmkeps$ to store the lexing information
       
   481 in $a_1$ before collapsing it (as it has been fully matched by string prior to $c$, 
       
   482 and attach the collected bit-codes to the front of $a_2$
       
   483 before throwing away $a_1$. We assume that $\bmkeps$ correctly extracts the bitcode for how $a_1$
       
   484 matches the string prior to $c$ (more on this later).
       
   485 The bitsequence $\textit{bs}$ which was initially attached to the first element of the sequence
       
   486 $a_1 \cdot a_2$, has now been elevated to the top level of teh $\sum$. 
       
   487 This is because this piece of information will be needed whichever way the sequence is matched,
       
   488 regardless of whether $c$ belongs to $a_1$ or $a_2$.
       
   489 
       
   490 In the injection-based lexing, $r_1$ is immediately thrown away in 
       
   491 subsequent derivatives on the right branch (when $r_1$ is $\nullable$),
       
   492 \begin{center}
       
   493 	$(r_1 \cdot r_2 )\backslash c = (r_1 \backslash c) \cdot r_2 + r_2 \backslash c$
       
   494 \end{center}
       
   495 \noindent
       
   496 as it knows $r_1$ is stored on stack and available once the recursive 
       
   497 call to later derivatives finish.
       
   498 Therefore, if the $\Right$ branch is taken in a $\POSIX$ match,
       
   499 we construct back the sequence value once step back by
       
   500 calling a on $\mkeps(r_1)$.
       
   501 \begin{center}
       
   502 	\begin{tabular}{lcr}
       
   503 		$\ldots r_1 \cdot r_2$ & $\rightarrow$ & $r_1\cdot r_2 + r_2 \backslash c \ldots $\\
       
   504 		$\ldots \Seq(v_1, v_2) (\Seq(\mkeps(r1), (\inj \; r_2 \; c\; v_{2c})))$ & $\leftarrow$ & $\Right(v_{2c})\ldots$
       
   505 	\end{tabular}
       
   506 \end{center}
       
   507 \noindent
       
   508 The rest of the clauses of $\bder$ is rather similar to
       
   509 $\der$, and is put together here as a wholesome definition
       
   510 for $\bder$:
   496 \noindent
   511 \noindent
   497 Generalising the derivative operation with bitcodes to strings, we have 
   512 Generalising the derivative operation with bitcodes to strings, we have 
   498 \begin{center}
   513 \begin{center}
   499 	\begin{tabular}{@{}lcl@{}}
   514 	\begin{tabular}{@{}lcl@{}}
   500 		$a\backslash_s [] $ & $\dn$ & $a$\\
   515 		$a\backslash_s [] $ & $\dn$ & $a$\\