948 $\textit{SEQ}\;bs\,(\textit{fuse}\, [\Z] (r\backslash c))\, |
948 $\textit{SEQ}\;bs\,(\textit{fuse}\, [\Z] (r\backslash c))\, |
949 (\textit{STAR}\,[]\,r)$ |
949 (\textit{STAR}\,[]\,r)$ |
950 \end{tabular} |
950 \end{tabular} |
951 \end{center} |
951 \end{center} |
952 %\end{definition} |
952 %\end{definition} |
953 For instance, we attach an additional bit $\Z$ to |
953 |
954 the front of $r \backslash c$ to indicate that there |
954 For instance, when we unfold $STAR \; bs \; a$ into a sequence, we |
955 is one more star iteration when we unfold $\textit{STAR} \; bs \; a$ |
955 attach an additional bit Z to the front of $r \backslash c$ to indicate |
956 into a sequence in the last clause. |
956 that there is one more star iteration. The other example, the $SEQ$ |
957 A more subtle treatment of the bits lies in the $\textit{SEQ}$ clause |
957 clause is more subtle-- when $a_1$ is $bnullable$(here bnullable is |
958 when $a_1$ is $\textit{bnullable}$($\textit{bnullable}$ |
958 exactly the same as nullable, except that it is for annotated regular |
959 is exactly the same as $\textit{nullable}$, except that it is for |
959 expressions, therefore we omit the definition). Assume that $bmkeps$ |
960 annotated regular expressions, therefore we omit the definition). |
960 correctly extracts the bitcode for how $a_1$ matches the string prior to |
961 $\textit{bmkeps} \, a_1$ extracts the bitcode accumulated in |
961 character c(more on this later), then the right branch of $ALTS$, which |
962 $a_1$ for how the prefix $s_1$ of the input string ending at character $c$ |
962 is $fuse \; bmkeps \; a_1 (a_2 \backslash c)$ will collapse the regular |
963 is matched by the predecessor of $a_1$(before $s_1$ was chopped off). |
963 expression $a_1$(as it has already been fully matched) and store the |
964 Then $\textit{fuse}$ stores the information |
964 parsing information at the head of the regular expression $a_2 |
965 extracted by $\textit{bmkeps}$ at the head of the |
965 \backslash c$ by fusing to it. The bitsequence $bs$, which was initially |
966 regular expression $a_2 \backslash c$. |
966 attached to the head of $SEQ$, has now been elevated to the top-level of |
967 The bitsequence $bs$, which was initially |
967 ALT, as this information will be needed whichever way the $SEQ$ is |
968 attached to the head of $\textit{SEQ}$, has |
968 matched--no matter whether c belongs to $a_1$ or $ a_2$. After carefully |
969 now been elevated to the top-level $\textit{ALT}$, |
969 doing these derivatives and maintaining all the parsing information, we |
970 as this information will be needed |
970 complete the parsing by collecting the bits using a special $mkeps$ |
971 whichever way the $\textit{SEQ}\,bs \,a_1\,a_2$ is |
971 function for annotated regular expressions--$bmkeps$: |
972 matched--no matter whether $c$ belongs to $a_1$ or $ a_2$. |
|
973 After carefully doing these derivatives and |
|
974 maintaining all the parsing information, |
|
975 we complete the parsing by collecting the bits using |
|
976 a special $\textit{mkeps}$ function for annotated |
|
977 regular expressions--$\textit{bmkeps}$: |
|
978 |
972 |
979 |
973 |
980 %\begin{definition}[\textit{bmkeps}]\mbox{} |
974 %\begin{definition}[\textit{bmkeps}]\mbox{} |
981 \begin{center} |
975 \begin{center} |
982 \begin{tabular}{lcl} |
976 \begin{tabular}{lcl} |
1129 extensively by using the method in Figure~3 as a reference but yet |
1123 extensively by using the method in Figure~3 as a reference but yet |
1130 have to work out all proof details.'' |
1124 have to work out all proof details.'' |
1131 \end{quote} |
1125 \end{quote} |
1132 |
1126 |
1133 \noindent |
1127 \noindent |
1134 We would settle the correctness claim. |
1128 We would settle the correctness claim. It is relatively straightforward |
1135 It is relatively straightforward to establish that after one simplification step, |
1129 to establish that after one simplification step, the part of a nullable |
1136 the part of a nullable derivative that corresponds to a POSIX value |
1130 derivative that corresponds to a POSIX value remains intact and can |
1137 remains intact and can still be collected, |
1131 still be collected, in other words, |
1138 in other words, |
1132 |
1139 \begin{center} |
1133 \begin{center} |
1140 $\textit{bmkeps} \; r = \textit{bmkeps} \; \textit{simp} \; r\;( r\; \textit{nullable})$ |
1134 $\textit{bmkeps} \; r = \textit{bmkeps} \; \textit{simp} \; r\;( r\; \textit{nullable})$ |
1141 \end{center} |
1135 \end{center} |
1142 as this basically comes down to proving actions like removing |
1136 |
1143 the additional $r$ in $r+r$ does not delete important POSIX information |
1137 \noindent |
1144 in a regular expression. |
1138 as this basically comes down to proving actions like removing the |
1145 The hardcore of this problem is to prove that |
1139 additional $r$ in $r+r$ does not delete important POSIX information in |
|
1140 a regular expression. The hardcore of this problem is to prove that |
|
1141 |
1146 \begin{center} |
1142 \begin{center} |
1147 $\textit{bmkeps} \; \textit{blexer}\_{simp} \; r = \textit{bmkeps} \; \textit{blexer} \; \textit{simp} \; r$ |
1143 $\textit{bmkeps} \; \textit{blexer}\_{simp} \; r = \textit{bmkeps} \; \textit{blexer} \; \textit{simp} \; r$ |
1148 \end{center} |
1144 \end{center} |
|
1145 |
|
1146 \noindent |
1149 That is, if we do derivative on regular expression r and the simplified version, |
1147 That is, if we do derivative on regular expression r and the simplified version, |
1150 they can still provide the same POSIX value if there is one . |
1148 they can still provide the same POSIX value if there is one . |
1151 This is not as straightforward as the previous proposition, as the two regular expressions $r$ and $\textit{simp}\; r$ |
1149 This is not as straightforward as the previous proposition, as the two regular expressions $r$ and $\textit{simp}\; r$ |
1152 might become very different regular expressions after repeated application of $\textit{simp}$ and derivative. |
1150 might become very different regular expressions after repeated application of $\textit{simp}$ and derivative. |
1153 The crucial point is to find the indispensable information of |
1151 The crucial point is to find the indispensable information of |