608 $(r^*)^\uparrow$ & $\dn$ & |
608 $(r^*)^\uparrow$ & $\dn$ & |
609 $\textit{STAR}\;[]\,r^\uparrow$\\ |
609 $\textit{STAR}\;[]\,r^\uparrow$\\ |
610 \end{tabular} |
610 \end{tabular} |
611 \end{center} |
611 \end{center} |
612 %\end{definition} |
612 %\end{definition} |
|
613 |
|
614 Here $fuse$ is an auxiliary function that helps to attach bits to the front of an annotated regular expression. Its definition goes as follows: |
|
615 \begin{center} |
|
616 \begin{tabular}{lcl} |
|
617 $\textit{fuse}\,bs\,(\textit{ZERO})$ & $\dn$ & $\textit{ZERO}$\\ |
|
618 $\textit{fuse}\,bs\,(\textit{ONE}\,bs')$ & $\dn$ & |
|
619 $\textit{ONE}\,(bs\,@\,bs')$\\ |
|
620 $\textit{fuse}\,bs\,(\textit{CHAR}\,bs'\,c)$ & $\dn$ & |
|
621 $\textit{CHAR}\,(bs\,@\,bs')\,c$\\ |
|
622 $\textit{fuse}\,bs\,(\textit{ALT}\,bs'\,a_1\,a_2)$ & $\dn$ & |
|
623 $\textit{ALT}\,(bs\,@\,bs')\,a_1\,a_2$\\ |
|
624 $\textit{fuse}\,bs\,(\textit{SEQ}\,bs'\,a_1\,a_2)$ & $\dn$ & |
|
625 $\textit{SEQ}\,(bs\,@\,bs')\,a_1\,a_2$\\ |
|
626 $\textit{fuse}\,bs\,(\textit{STAR}\,bs'\,a)$ & $\dn$ & |
|
627 $\textit{STAR}\,(bs\,@\,bs')\,a$ |
|
628 \end{tabular} |
|
629 \end{center} |
|
630 |
613 After internalise we do successive derivative operations on the annotated regular expression. |
631 After internalise we do successive derivative operations on the annotated regular expression. |
614 Here $fuse$ is an auxiliary function that helps to attach bits to the front of an annotated regular expression. |
632 This derivative operation is the same as what we previously have for the simple regular expressions, except that we take special care of the bits :\\ |
615 This derivative operation is the same as what we previously have for the simple regular expressions, except that we take special care of the bits to store the parse tree information:\\ |
|
616 %\begin{definition}{bder} |
633 %\begin{definition}{bder} |
617 \begin{center} |
634 \begin{center} |
618 \begin{tabular}{@{}lcl@{}} |
635 \begin{tabular}{@{}lcl@{}} |
619 $(\textit{ZERO})\backslash c$ & $\dn$ & $\textit{ZERO}$\\ |
636 $(\textit{ZERO})\backslash c$ & $\dn$ & $\textit{ZERO}$\\ |
620 $(\textit{ONE}\;bs)\backslash c$ & $\dn$ & $\textit{ZERO}$\\ |
637 $(\textit{ONE}\;bs)\backslash c$ & $\dn$ & $\textit{ZERO}$\\ |
632 $\textit{SEQ}\;bs\,(\textit{fuse}\, [\Z] (r\backslash c))\, |
649 $\textit{SEQ}\;bs\,(\textit{fuse}\, [\Z] (r\backslash c))\, |
633 (\textit{STAR}\,[]\,r)$ |
650 (\textit{STAR}\,[]\,r)$ |
634 \end{tabular} |
651 \end{tabular} |
635 \end{center} |
652 \end{center} |
636 %\end{definition} |
653 %\end{definition} |
637 This way, we do not have to use an injection function and a second phase, but instead only need to collect the bits while running $mkeps$: |
654 For instance, when we unfold $STAR \; bs \; a$ into a sequence, we attach an additional bit Z to the front of $r \backslash c$ to indicate that there is one more star iteration. |
|
655 The other example, the $SEQ$ clause is more subtle-- when $a_1$ is $bnullable$(here bnullable is exactly the same as nullable, except that it is for annotated regular expressions, therefore we omit the definition). |
|
656 Assume that $bmkeps$ correctly extracts the bitcode for how $a_1$ matches the string prior to character c(more on this later), then the right branch of $ALTS$, which is $fuse \; bmkeps \; a_1 (a_2 \backslash c)$ will collapse the regular expression $a_1$(as it has already been fully matched) and store the parsing information at the head of the regular expression $a_2 \backslash c$ by fusing to it. The bitsequence $bs$, which was initially attached to the head of $SEQ$, has now been elevated to the top-level of ALT, |
|
657 as this information will be needed whichever way the $SEQ$ is matched--no matter whether c belongs to $a_1$ or $ a_2$. |
|
658 After carefully doing these derivatives and maintaining all the parsing information, we complete the parsing by collecting the bits using a special $mkeps$ function for annotated regular expressions--$bmkeps$: |
|
659 |
|
660 |
638 %\begin{definition}[\textit{bmkeps}]\mbox{} |
661 %\begin{definition}[\textit{bmkeps}]\mbox{} |
639 \begin{center} |
662 \begin{center} |
640 \begin{tabular}{lcl} |
663 \begin{tabular}{lcl} |
641 $\textit{bmkeps}\,(\textit{ONE}\,bs)$ & $\dn$ & $bs$\\ |
664 $\textit{bmkeps}\,(\textit{ONE}\,bs)$ & $\dn$ & $bs$\\ |
642 $\textit{bmkeps}\,(\textit{ALT}\,bs\,a_1\,a_2)$ & $\dn$ & |
665 $\textit{bmkeps}\,(\textit{ALT}\,bs\,a_1\,a_2)$ & $\dn$ & |
648 $\textit{bmkeps}\,(\textit{STAR}\,bs\,a)$ & $\dn$ & |
671 $\textit{bmkeps}\,(\textit{STAR}\,bs\,a)$ & $\dn$ & |
649 $bs \,@\, [\S]$ |
672 $bs \,@\, [\S]$ |
650 \end{tabular} |
673 \end{tabular} |
651 \end{center} |
674 \end{center} |
652 %\end{definition} |
675 %\end{definition} |
653 and then decode the bits using the regular expression. After putting these pieces together, the whole process looks like this:\\ |
676 This function completes the parse tree information by |
|
677 travelling along the path on the regular epxression that corresponds to a POSIX value snd collect all the bits, and |
|
678 using S to indicate the end of star iterations. If we take the bitsproduced by $bmkeps$ and decode it, |
|
679 we get the parse tree we need, the working flow looks like this:\\ |
654 \begin{center} |
680 \begin{center} |
655 \begin{tabular}{lcl} |
681 \begin{tabular}{lcl} |
656 $\textit{blexer}\;r\,s$ & $\dn$ & |
682 $\textit{blexer}\;r\,s$ & $\dn$ & |
657 $\textit{let}\;a = (r^\uparrow)\backslash s\;\textit{in}$\\ |
683 $\textit{let}\;a = (r^\uparrow)\backslash s\;\textit{in}$\\ |
658 & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\ |
684 & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\ |