927 $\textit{SEQ}\;bs\,(\textit{fuse}\, [\Z] (r\backslash c))\, |
927 $\textit{SEQ}\;bs\,(\textit{fuse}\, [\Z] (r\backslash c))\, |
928 (\textit{STAR}\,[]\,r)$ |
928 (\textit{STAR}\,[]\,r)$ |
929 \end{tabular} |
929 \end{tabular} |
930 \end{center} |
930 \end{center} |
931 %\end{definition} |
931 %\end{definition} |
932 For instance, we attach an additional bit $\Z$ to |
932 |
933 the front of $r \backslash c$ to indicate that there |
933 For instance, when we unfold $STAR \; bs \; a$ into a sequence, we |
934 is one more star iteration when we unfold $\textit{STAR} \; bs \; a$ |
934 attach an additional bit Z to the front of $r \backslash c$ to indicate |
935 into a sequence in the last clause. |
935 that there is one more star iteration. The other example, the $SEQ$ |
936 A more subtle treatment of the bits lies in the $\textit{SEQ}$ clause |
936 clause is more subtle-- when $a_1$ is $bnullable$(here bnullable is |
937 when $a_1$ is $\textit{bnullable}$($\textit{bnullable}$ |
937 exactly the same as nullable, except that it is for annotated regular |
938 is exactly the same as $\textit{nullable}$, except that it is for |
938 expressions, therefore we omit the definition). Assume that $bmkeps$ |
939 annotated regular expressions, therefore we omit the definition). |
939 correctly extracts the bitcode for how $a_1$ matches the string prior to |
940 $\textit{bmkeps} \, a_1$ extracts the bitcode accumulated in |
940 character c(more on this later), then the right branch of $ALTS$, which |
941 $a_1$ for how the prefix $s_1$ of the input string ending at character $c$ |
941 is $fuse \; bmkeps \; a_1 (a_2 \backslash c)$ will collapse the regular |
942 is matched by the predecessor of $a_1$(before $s_1$ was chopped off). |
942 expression $a_1$(as it has already been fully matched) and store the |
943 Then $\textit{fuse}$ stores the information |
943 parsing information at the head of the regular expression $a_2 |
944 extracted by $\textit{bmkeps}$ at the head of the |
944 \backslash c$ by fusing to it. The bitsequence $bs$, which was initially |
945 regular expression $a_2 \backslash c$. |
945 attached to the head of $SEQ$, has now been elevated to the top-level of |
946 The bitsequence $bs$, which was initially |
946 ALT, as this information will be needed whichever way the $SEQ$ is |
947 attached to the head of $\textit{SEQ}$, has |
947 matched--no matter whether c belongs to $a_1$ or $ a_2$. After carefully |
948 now been elevated to the top-level $\textit{ALT}$, |
948 doing these derivatives and maintaining all the parsing information, we |
949 as this information will be needed |
949 complete the parsing by collecting the bits using a special $mkeps$ |
950 whichever way the $\textit{SEQ}\,bs \,a_1\,a_2$ is |
950 function for annotated regular expressions--$bmkeps$: |
951 matched--no matter whether $c$ belongs to $a_1$ or $ a_2$. |
|
952 After carefully doing these derivatives and |
|
953 maintaining all the parsing information, |
|
954 we complete the parsing by collecting the bits using |
|
955 a special $\textit{mkeps}$ function for annotated |
|
956 regular expressions--$\textit{bmkeps}$: |
|
957 |
951 |
958 |
952 |
959 %\begin{definition}[\textit{bmkeps}]\mbox{} |
953 %\begin{definition}[\textit{bmkeps}]\mbox{} |
960 \begin{center} |
954 \begin{center} |
961 \begin{tabular}{lcl} |
955 \begin{tabular}{lcl} |
1108 extensively by using the method in Figure~3 as a reference but yet |
1102 extensively by using the method in Figure~3 as a reference but yet |
1109 have to work out all proof details.'' |
1103 have to work out all proof details.'' |
1110 \end{quote} |
1104 \end{quote} |
1111 |
1105 |
1112 \noindent |
1106 \noindent |
1113 We would settle the correctness claim. |
1107 We would settle the correctness claim. It is relatively straightforward |
1114 It is relatively straightforward to establish that after one simplification step, |
1108 to establish that after one simplification step, the part of a nullable |
1115 the part of a nullable derivative that corresponds to a POSIX value |
1109 derivative that corresponds to a POSIX value remains intact and can |
1116 remains intact and can still be collected, |
1110 still be collected, in other words, |
1117 in other words, |
1111 |
1118 \begin{center} |
1112 \begin{center} |
1119 $\textit{bmkeps} \; r = \textit{bmkeps} \; \textit{simp} \; r\;( r\; \textit{nullable})$ |
1113 $\textit{bmkeps} \; r = \textit{bmkeps} \; \textit{simp} \; r\;( r\; \textit{nullable})$ |
1120 \end{center} |
1114 \end{center} |
1121 as this basically comes down to proving actions like removing |
1115 |
1122 the additional $r$ in $r+r$ does not delete important POSIX information |
1116 \noindent |
1123 in a regular expression. |
1117 as this basically comes down to proving actions like removing the |
1124 The hardcore of this problem is to prove that |
1118 additional $r$ in $r+r$ does not delete important POSIX information in |
|
1119 a regular expression. The hardcore of this problem is to prove that |
|
1120 |
1125 \begin{center} |
1121 \begin{center} |
1126 $\textit{bmkeps} \; \textit{blexer}\_{simp} \; r = \textit{bmkeps} \; \textit{blexer} \; \textit{simp} \; r$ |
1122 $\textit{bmkeps} \; \textit{blexer}\_{simp} \; r = \textit{bmkeps} \; \textit{blexer} \; \textit{simp} \; r$ |
1127 \end{center} |
1123 \end{center} |
|
1124 |
|
1125 \noindent |
1128 That is, if we do derivative on regular expression r and the simplified version, |
1126 That is, if we do derivative on regular expression r and the simplified version, |
1129 they can still provide the same POSIX value if there is one . |
1127 they can still provide the same POSIX value if there is one . |
1130 This is not as straightforward as the previous proposition, as the two regular expressions $r$ and $\textit{simp}\; r$ |
1128 This is not as straightforward as the previous proposition, as the two regular expressions $r$ and $\textit{simp}\; r$ |
1131 might become very different regular expressions after repeated application of $\textit{simp}$ and derivative. |
1129 might become very different regular expressions after repeated application of $\textit{simp}$ and derivative. |
1132 The crucial point is to find the indispensable information of |
1130 The crucial point is to find the indispensable information of |