212 \end{tabular} |
207 \end{tabular} |
213 \end{center} |
208 \end{center} |
214 %\end{definition} |
209 %\end{definition} |
215 |
210 |
216 \noindent |
211 \noindent |
217 The function $\decode'$ returns a pair consisting of a partially decoded value and some leftover: |
212 The function $\decode'$ returns a pair consisting of |
218 $\decode'$ does most of the job while $\decode$ throws |
213 a partially decoded value and some leftover bit list that cannot |
219 away leftover bit-codes and returns the value only. |
214 be decide yet. |
|
215 The function $\decode'$ succeeds if the left-over |
|
216 bit-sequence is empty. |
220 $\decode$ is terminating as $\decode'$ is terminating. |
217 $\decode$ is terminating as $\decode'$ is terminating. |
221 We have the property that $\decode$ and $\code$ are |
218 $\decode'$ is terminating |
|
219 because at least one of $\decode'$'s parameters will go down in terms |
|
220 of size. |
|
221 Assuming we have a value $v$ and regular expression $r$ |
|
222 with $\vdash v:r$, |
|
223 then we have the property that $\decode$ and $\code$ are |
222 reverse operations of one another: |
224 reverse operations of one another: |
223 \begin{lemma} |
225 \begin{lemma} |
224 \[\vdash v : r \implies \decode \; (\code \; v) \; r = \textit{Some}(v) \] |
226 \[If \vdash v : r \; then \;\decode \; (\code \; v) \; r = \textit{Some}(v) \] |
225 \end{lemma} |
227 \end{lemma} |
226 \begin{proof} |
228 \begin{proof} |
227 By proving a more general version of the lemma, on $\decode'$: |
229 By proving a more general version of the lemma, on $\decode'$: |
228 \[\vdash v : r \implies \decode' \; ((\code \; v) @ ds) \; r = (v, ds) \] |
230 \[\vdash v : r \implies \decode' \; ((\code \; v) @ ds) \; r = (v, ds) \] |
229 Then setting $ds$ to be $[]$ and unfolding $\decode$ definition |
231 Then setting $ds$ to be $[]$ and unfolding $\decode$ definition, |
230 we get the lemma. |
232 we obtain the property. |
231 \end{proof} |
233 \end{proof} |
232 With the $\code$ and $\decode$ functions in hand, we know how to |
234 With the $\code$ and $\decode$ functions in hand, we know how to |
233 switch between bit-codes and value--the two different representations of |
235 switch between bit-codes and values. |
234 lexing information. |
236 The next step is to integrate this information into regular expression. |
235 The next step is to integrate this information into the working regular expression. |
|
236 Attaching bits to the front of regular expressions is the solution Sulzamann and Lu |
237 Attaching bits to the front of regular expressions is the solution Sulzamann and Lu |
237 gave for storing partial values on the fly: |
238 gave for storing partial values in regular expressions. |
|
239 Annotated regular expressions are therefore defined as the Isabelle |
|
240 datatype: |
238 |
241 |
239 \begin{center} |
242 \begin{center} |
240 \begin{tabular}{lcl} |
243 \begin{tabular}{lcl} |
241 $\textit{a}$ & $::=$ & $\ZERO$\\ |
244 $\textit{a}$ & $::=$ & $\ZERO$\\ |
242 & $\mid$ & $_{bs}\ONE$\\ |
245 & $\mid$ & $_{bs}\ONE$\\ |
247 \end{tabular} |
250 \end{tabular} |
248 \end{center} |
251 \end{center} |
249 %(in \textit{ALTS}) |
252 %(in \textit{ALTS}) |
250 |
253 |
251 \noindent |
254 \noindent |
252 We call these regular expressions carrying bit-codes \emph{Annotated regular expressions}. |
255 where $bs$ stands for bit-codes, $a$ for $\mathbf{a}$nnotated regular |
253 $bs$ stands for bit-codes, $a$ for $\mathbf{a}$nnotated regular |
256 expressions and $as$ for lists of annotated regular expressions. |
254 expressions and $as$ for a list of annotated regular expressions. |
257 The alternative constructor, written, $\sum$, has been generalised to |
255 The alternative constructor ($\sum$) has been generalised to |
258 accept a list of annotated regular expressions rather than just two. |
256 accept a list of annotated regular expressions rather than just 2. |
259 Why is it generalised? This is because when we open up nested |
257 |
260 alternatives, there could be more than two elements at the same level |
258 |
261 after de-duplication, which can no longer be stored in a binary |
259 The first thing we define related to bit-coded regular expressions |
262 constructor. |
260 is how we move bits, for instance pasting it at the front of an annotated regular expression. |
263 |
261 The operation $\fuse$ is just to attach bit-codes |
264 The first operation we define related to bit-coded regular expressions |
|
265 is how we move bits to the inside of regular expressions. |
|
266 Called $\fuse$, this operation is attaches bit-codes |
262 to the front of an annotated regular expression: |
267 to the front of an annotated regular expression: |
263 \begin{center} |
268 \begin{center} |
264 \begin{tabular}{lcl} |
269 \begin{tabular}{lcl} |
265 $\textit{fuse}\;bs \; \ZERO$ & $\dn$ & $\ZERO$\\ |
270 $\textit{fuse}\;bs \; \ZERO$ & $\dn$ & $\ZERO$\\ |
266 $\textit{fuse}\;bs\; _{bs'}\ONE$ & $\dn$ & |
271 $\textit{fuse}\;bs\; _{bs'}\ONE$ & $\dn$ & |
301 \end{center} |
304 \end{center} |
302 %\end{definition} |
305 %\end{definition} |
303 |
306 |
304 \noindent |
307 \noindent |
305 We use an up arrow with postfix notation |
308 We use an up arrow with postfix notation |
306 to denote the operation, |
309 to denote this operation. |
307 for convenience. The $\textit{internalise} \; r$ |
310 for convenience. The $\textit{internalise} \; r$ |
308 notation is more cumbersome. |
311 notation is more cumbersome. |
309 The opposite of $\textit{internalise}$ is |
312 The opposite of $\textit{internalise}$ is |
310 $\erase$, where all the bit-codes are removed, |
313 $\erase$, where all the bit-codes are removed, |
311 and the alternative operator $\sum$ for annotated |
314 and the alternative operator $\sum$ for annotated |
312 regular expressions is transformed to the binary alternatives |
315 regular expressions is transformed to the binary alternatives |
313 for plain regular expressions. |
316 for plain regular expressions. |
314 \begin{center} |
317 \begin{center} |
315 \begin{tabular}{lcr} |
318 \begin{tabular}{lcl} |
316 $\erase \; \ZERO$ & $\dn$ & $\ZERO$\\ |
319 $\ZERO_\downarrow$ & $\dn$ & $\ZERO$\\ |
317 $\erase \; _{bs}\ONE$ & $\dn$ & $\ONE$\\ |
320 $( _{bs}\ONE )_\downarrow$ & $\dn$ & $\ONE$\\ |
318 $\erase \; _{bs}\mathbf{c}$ & $\dn$ & $\mathbf{c}$\\ |
321 $( _{bs}\mathbf{c} )_\downarrow$ & $\dn$ & $\mathbf{c}$\\ |
319 $\erase \; _{bs} a_1 \cdot a_2$ & $\dn$ & $\erase \; r_1\cdot\erase\; r_2$\\ |
322 $( _{bs} a_1 \cdot a_2 )_\downarrow$ & $\dn$ & |
320 $\erase \; _{bs} [] $ & $\dn$ & $\ZERO$\\ |
323 $ (a_1) _\downarrow \cdot (a_2) _\downarrow$\\ |
321 $\erase \; _{bs} [a] $ & $\dn$ & $\erase \; a$\\ |
324 $( _{bs} [])_\downarrow $ & $\dn$ & $\ZERO $\\ |
322 $\erase \; _{bs} \sum [a_1, \; a_2]$ & $\dn$ & $\erase \; a_1 +\erase \; a_2$\\ |
325 $( _{bs} [a] )_\downarrow$ & $\dn$ & $a_\downarrow$\\ |
323 $\erase \; _{bs} \sum (a :: as)$ & $\dn$ & $\erase \; a + \erase \; _{[]} \sum as$\\ |
326 $_{bs} \sum [a_1, \; a_2]$ & $\dn$ & $ (a_1) _\downarrow + ( a_2 ) _\downarrow $\\ |
324 $\erase \; _{bs} a^*$ & $\dn$ & $(\erase \; a)^*$ |
327 $(_{bs} \sum (a :: as))_\downarrow$ & $\dn$ & $ a_\downarrow + \; (_{[]} \sum as)_\downarrow$\\ |
|
328 $( _{bs} a^* )_\downarrow$ & $\dn$ & $(a_\downarrow)^*$ |
325 \end{tabular} |
329 \end{tabular} |
326 \end{center} |
330 \end{center} |
327 \noindent |
331 \noindent |
328 We also abbreviate the $\erase\; a$ operation |
332 We also abbreviate the $\erase\; a$ operation |
329 as $a_\downarrow$, for conciseness. |
333 as $a_\downarrow$, for conciseness. |
330 For bit-coded regular expressions, as a different datatype, |
334 For bit-coded regular expressions, as a different datatype, |
331 testing whether they contain empty string in their lauguage requires |
335 testing whether they contain empty string in their lauguage requires |
332 a dedicated function $\bnullable$ |
336 a dedicated function $\bnullable$ |
333 which simply calls $\erase$ first before testing whether it is $\nullable$. |
337 which simply calls $\erase$ first before testing whether it is $\nullable$. |
334 \begin{center} |
338 \begin{definition} |
335 \begin{tabular}{lcr} |
339 $\bnullable \; a \dn \nullable \; (a_\downarrow)$ |
336 $\bnullable \; a $ & $\dn$ & $\nullable \; (a_\downarrow)$ |
340 \end{definition} |
337 \end{tabular} |
|
338 \end{center} |
|
339 The function for collecting the |
341 The function for collecting the |
340 bitcodes of a $\bnullable$ annotated regular expression |
342 bitcodes at the end of the derivative |
|
343 phase from a (b)nullable regular expression |
341 is a generalised version of the $\textit{mkeps}$ function |
344 is a generalised version of the $\textit{mkeps}$ function |
342 for annotated regular expressions, called $\textit{bmkeps}$: |
345 for annotated regular expressions, called $\textit{bmkeps}$: |
343 |
346 |
344 |
347 |
345 %\begin{definition}[\textit{bmkeps}]\mbox{} |
348 %\begin{definition}[\textit{bmkeps}]\mbox{} |
357 \end{tabular} |
360 \end{tabular} |
358 \end{center} |
361 \end{center} |
359 %\end{definition} |
362 %\end{definition} |
360 |
363 |
361 \noindent |
364 \noindent |
362 This function completes the value information by travelling along the |
365 $\bmkeps$ completes the value information by travelling along the |
363 path of the regular expression that corresponds to a POSIX value and |
366 path of the regular expression that corresponds to a POSIX value and |
364 collecting all the bitcodes, and using $S$ to indicate the end of star |
367 collecting all the bitcodes, and attaching $S$ to indicate the end of star |
365 iterations. |
368 iterations. |
366 |
369 |
367 The most central question is how these partial lexing information |
370 Now we give out the central part of this lexing algorithm, |
368 represented as bit-codes is augmented and carried around |
371 the $\bder$ function (stands for \emph{b}itcoded-derivative). |
369 during a derivative is taken. |
372 For most time we use the infix notation $(\_\backslash\_)$ |
370 This is done by adding bitcodes to the |
373 to mean $\bder$ for brevity when |
371 derivatives, for example when one more star iteratoin is taken (we |
374 there is no danger of confusion with derivatives on plain regular expressions. |
372 call the operation of derivatives on annotated regular expressions $\bder$ |
375 For example, we write $( _{[]}r^* ) \backslash c$ instead of $\bder \;c \; _{[]}r^*$, |
373 because it is derivatives on regular expressiones with \emph{b}itcodes), |
376 as the bitcodes at the front of $r^*$ indicates that it is |
374 we need to unfold it into a sequence, |
377 a bit-coded regular expression, not a plain one. |
375 and attach an additional bit $Z$ to the front of $r \backslash c$ |
378 $\bder$ tells us how regular expressions can be recursively traversed, |
376 to indicate one more star iteration. |
379 where the bitcodes are augmented and carried around |
377 \begin{center} |
380 when a derivative is taken. |
378 \begin{tabular}{@{}lcl@{}} |
|
379 $\bder \; c\; (_{bs}a^*) $ & $\dn$ & |
|
380 $_{bs}(\textit{fuse}\, [Z] \; \bder \; c \; a)\cdot |
|
381 (_{[]}a^*))$ |
|
382 \end{tabular} |
|
383 \end{center} |
|
384 |
|
385 \noindent |
|
386 For most time we use the infix notation $\backslash$ to mean $\bder$ for brevity when |
|
387 there is no danger of confusion with derivatives on plain regular expressions, |
|
388 for example, the above can be expressed as |
|
389 \begin{center} |
|
390 \begin{tabular}{@{}lcl@{}} |
|
391 $(_{bs}a^*)\,\backslash c$ & $\dn$ & |
|
392 $_{bs}(\textit{fuse}\, [Z] \; a\,\backslash c)\cdot |
|
393 (_{[]}a^*))$ |
|
394 \end{tabular} |
|
395 \end{center} |
|
396 |
|
397 \noindent |
|
398 Using the picture we used earlier to depict this, the transformation when |
|
399 taking a derivative w.r.t a star is like below: |
|
400 |
|
401 \begin{tabular}{@{}l@{\hspace{1mm}}l@{\hspace{0mm}}c@{}} |
|
402 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
|
403 \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] |
|
404 {$bs$ |
|
405 \nodepart{two} $a^*$ }; |
|
406 %\caption{term 1 \ref{term:1}'s matching configuration} |
|
407 \end{tikzpicture} |
|
408 & |
|
409 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
|
410 \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] |
|
411 {$v_{\text{previous iterations}}$ |
|
412 \nodepart{two} $a^*$}; |
|
413 %\caption{term 1 \ref{term:1}'s matching configuration} |
|
414 \end{tikzpicture} |
|
415 \\ |
|
416 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
|
417 \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] |
|
418 { $bs$ + [Z] |
|
419 \nodepart{two} $(a\backslash c )\cdot a^*$ }; |
|
420 %\caption{term 1 \ref{term:1}'s matching configuration} |
|
421 \end{tikzpicture} |
|
422 & |
|
423 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
|
424 \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] |
|
425 {$v_{\text{previous iterations}}$ + 1 more iteration |
|
426 \nodepart{two} $(a\backslash c )\cdot a^*$ }; |
|
427 %\caption{term 1 \ref{term:1}'s matching configuration} |
|
428 \end{tikzpicture} |
|
429 \end{tabular} |
|
430 \noindent |
|
431 Another place in the $\bder$ function where it differs |
|
432 from normal derivatives (on un-annotated regular expressions) |
|
433 is the sequence case: |
|
434 \begin{center} |
|
435 \begin{tabular}{@{}lcl@{}} |
|
436 |
|
437 $(_{bs}\;a_1\cdot a_2)\,\backslash c$ & $\dn$ & |
|
438 $\textit{if}\;\textit{bnullable}\,a_1$\\ |
|
439 & &$\textit{then}\;_{bs}\sum\,[(_{[]}\,(a_1\,\backslash c)\cdot\,a_2),$\\ |
|
440 & &$\phantom{\textit{then},\;_{bs}\sum\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\ |
|
441 & &$\textit{else}\;_{bs}\,(a_1\,\backslash c)\cdot a_2$ |
|
442 \end{tabular} |
|
443 \end{center} |
|
444 |
|
445 The difference is that (when $a_1$ is $\bnullable$) |
|
446 we use $\bmkeps$ to store the lexing information |
|
447 in $a_1$ before collapsing it (as it has been fully matched by string prior to $c$, |
|
448 and attach the collected bit-codes to the front of $a_2$ |
|
449 before throwing away $a_1$. We assume that $\bmkeps$ correctly extracts the bitcode for how $a_1$ |
|
450 matches the string prior to $c$ (more on this later). |
|
451 The bitsequence $\textit{bs}$ which was initially attached to the first element of the sequence |
|
452 $a_1 \cdot a_2$, has now been elevated to the top level of teh $\sum$. |
|
453 This is because this piece of information will be needed whichever way the sequence is matched, |
|
454 regardless of whether $c$ belongs to $a_1$ or $a_2$. |
|
455 |
|
456 In the injection-based lexing, $r_1$ is immediately thrown away in |
|
457 subsequent derivatives on the right branch (when $r_1$ is $\nullable$), |
|
458 \begin{center} |
|
459 $(r_1 \cdot r_2 )\backslash c = (r_1 \backslash c) \cdot r_2 + r_2 \backslash c$ |
|
460 \end{center} |
|
461 \noindent |
|
462 as it knows $r_1$ is stored on stack and available once the recursive |
|
463 call to later derivatives finish. |
|
464 Therefore, if the $\Right$ branch is taken in a $\POSIX$ match, |
|
465 we construct back the sequence value once step back by |
|
466 calling a on $\mkeps(r_1)$. |
|
467 \begin{center} |
|
468 \begin{tabular}{lcr} |
|
469 $\ldots r_1 \cdot r_2$ & $\rightarrow$ & $r_1\cdot r_2 + r_2 \backslash c \ldots $\\ |
|
470 $\ldots \Seq(v_1, v_2) (\Seq(\mkeps(r1), (\inj \; r_2 \; c\; v_{2c})))$ & $\leftarrow$ & $\Right(v_{2c})\ldots$ |
|
471 \end{tabular} |
|
472 \end{center} |
|
473 \noindent |
|
474 The rest of the clauses of $\bder$ is rather similar to |
|
475 $\der$, and is put together here as a wholesome definition |
|
476 for $\bder$: |
|
477 \begin{center} |
381 \begin{center} |
478 \begin{tabular}{@{}lcl@{}} |
382 \begin{tabular}{@{}lcl@{}} |
479 $(\ZERO)\,\backslash c$ & $\dn$ & $\ZERO$\\ |
383 $(\ZERO)\,\backslash c$ & $\dn$ & $\ZERO$\\ |
480 $(_{bs}\ONE)\,\backslash c$ & $\dn$ & $\ZERO$\\ |
384 $(_{bs}\ONE)\,\backslash c$ & $\dn$ & $\ZERO$\\ |
481 $(_{bs}{\bf d})\,\backslash c$ & $\dn$ & |
385 $(_{bs}{\bf d})\,\backslash c$ & $\dn$ & |
491 $(_{bs}a^*)\,\backslash c$ & $\dn$ & |
395 $(_{bs}a^*)\,\backslash c$ & $\dn$ & |
492 $_{bs}(\textit{fuse}\, [Z] \; r\,\backslash c)\cdot |
396 $_{bs}(\textit{fuse}\, [Z] \; r\,\backslash c)\cdot |
493 (_{[]}r^*))$ |
397 (_{[]}r^*))$ |
494 \end{tabular} |
398 \end{tabular} |
495 \end{center} |
399 \end{center} |
|
400 \noindent |
|
401 We give the intuition behind some of the more involved cases in |
|
402 $\bder$. For example, |
|
403 in the \emph{star} case, |
|
404 a derivative on $_{bs}a^*$ means |
|
405 that one more star iteratoin needs to be taken. |
|
406 we need to unfold it into a sequence, |
|
407 and attach an additional bit $Z$ to the front of $r \backslash c$ |
|
408 as a record to indicate one new star iteration is unfolded. |
|
409 |
|
410 \noindent |
|
411 \begin{center} |
|
412 \begin{tabular}{@{}lcl@{}} |
|
413 $(_{bs}a^*)\,\backslash c$ & $\dn$ & |
|
414 $_{bs}(\underbrace{\textit{fuse}\, [Z] \; a\,\backslash c}_{\text{One more iteration}})\cdot |
|
415 (_{[]}a^*))$ |
|
416 \end{tabular} |
|
417 \end{center} |
|
418 |
|
419 \noindent |
|
420 This information will be recovered later by the $\decode$ function. |
|
421 The intuition is that the bit $Z$ will be decoded at the right location, |
|
422 because we accumulate bits from left to right (a rigorous proof will be given |
|
423 later). |
|
424 |
|
425 \begin{tikzpicture}[ > = stealth, % arrow head style |
|
426 shorten > = 1pt, % don't touch arrow head to node |
|
427 semithick % line style |
|
428 ] |
|
429 |
|
430 \tikzstyle{every state}=[ |
|
431 draw = black, |
|
432 thin, |
|
433 fill = cyan!29, |
|
434 minimum size = 7mm |
|
435 ] |
|
436 \begin{scope}[node distance=1cm and 0cm, every node/.style=state] |
|
437 \node (k) [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] |
|
438 {$bs$ |
|
439 \nodepart{two} $a^*$ }; |
|
440 \node (l) [below =of k, rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] |
|
441 { $bs$ + [Z] |
|
442 \nodepart{two} $(a\backslash c )\cdot a^*$ }; |
|
443 \end{scope} |
|
444 \path[->] |
|
445 (k) edge (l); |
|
446 \end{tikzpicture} |
|
447 |
|
448 Pictorially the process looks like below. |
|
449 Like before, the red region denotes |
|
450 previous lexing information (stored as bitcodes in $bs$). |
|
451 |
|
452 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
|
453 \begin{scope}[node distance=1cm] |
|
454 \node (a) [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] |
|
455 {$bs$ |
|
456 \nodepart{two} $a^*$ }; |
|
457 \node (b) [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] |
|
458 { $bs$ + [Z] |
|
459 \nodepart{two} $(a\backslash c )\cdot a^*$ }; |
|
460 %\caption{term 1 \ref{term:1}'s matching configuration} |
|
461 \end{scope} |
|
462 \end{tikzpicture} |
|
463 |
|
464 \noindent |
|
465 Another place in the $\bder$ function where it differs |
|
466 from normal derivatives (on un-annotated regular expressions) |
|
467 is the sequence case: |
|
468 \begin{center} |
|
469 \begin{tabular}{@{}lcl@{}} |
|
470 |
|
471 $(_{bs}\;a_1\cdot a_2)\,\backslash c$ & $\dn$ & |
|
472 $\textit{if}\;\textit{bnullable}\,a_1$\\ |
|
473 & &$\textit{then}\;_{bs}\sum\,[(_{[]}\,(a_1\,\backslash c)\cdot\,a_2),$\\ |
|
474 & &$\phantom{\textit{then},\;_{bs}\sum\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\ |
|
475 & &$\textit{else}\;_{bs}\,(a_1\,\backslash c)\cdot a_2$ |
|
476 \end{tabular} |
|
477 \end{center} |
|
478 |
|
479 The difference is that (when $a_1$ is $\bnullable$) |
|
480 we use $\bmkeps$ to store the lexing information |
|
481 in $a_1$ before collapsing it (as it has been fully matched by string prior to $c$, |
|
482 and attach the collected bit-codes to the front of $a_2$ |
|
483 before throwing away $a_1$. We assume that $\bmkeps$ correctly extracts the bitcode for how $a_1$ |
|
484 matches the string prior to $c$ (more on this later). |
|
485 The bitsequence $\textit{bs}$ which was initially attached to the first element of the sequence |
|
486 $a_1 \cdot a_2$, has now been elevated to the top level of teh $\sum$. |
|
487 This is because this piece of information will be needed whichever way the sequence is matched, |
|
488 regardless of whether $c$ belongs to $a_1$ or $a_2$. |
|
489 |
|
490 In the injection-based lexing, $r_1$ is immediately thrown away in |
|
491 subsequent derivatives on the right branch (when $r_1$ is $\nullable$), |
|
492 \begin{center} |
|
493 $(r_1 \cdot r_2 )\backslash c = (r_1 \backslash c) \cdot r_2 + r_2 \backslash c$ |
|
494 \end{center} |
|
495 \noindent |
|
496 as it knows $r_1$ is stored on stack and available once the recursive |
|
497 call to later derivatives finish. |
|
498 Therefore, if the $\Right$ branch is taken in a $\POSIX$ match, |
|
499 we construct back the sequence value once step back by |
|
500 calling a on $\mkeps(r_1)$. |
|
501 \begin{center} |
|
502 \begin{tabular}{lcr} |
|
503 $\ldots r_1 \cdot r_2$ & $\rightarrow$ & $r_1\cdot r_2 + r_2 \backslash c \ldots $\\ |
|
504 $\ldots \Seq(v_1, v_2) (\Seq(\mkeps(r1), (\inj \; r_2 \; c\; v_{2c})))$ & $\leftarrow$ & $\Right(v_{2c})\ldots$ |
|
505 \end{tabular} |
|
506 \end{center} |
|
507 \noindent |
|
508 The rest of the clauses of $\bder$ is rather similar to |
|
509 $\der$, and is put together here as a wholesome definition |
|
510 for $\bder$: |
496 \noindent |
511 \noindent |
497 Generalising the derivative operation with bitcodes to strings, we have |
512 Generalising the derivative operation with bitcodes to strings, we have |
498 \begin{center} |
513 \begin{center} |
499 \begin{tabular}{@{}lcl@{}} |
514 \begin{tabular}{@{}lcl@{}} |
500 $a\backslash_s [] $ & $\dn$ & $a$\\ |
515 $a\backslash_s [] $ & $\dn$ & $a$\\ |