1136 Figure 6 yields POSIX parse trees. We have tested this claim |
1136 Figure 6 yields POSIX parse trees. We have tested this claim |
1137 extensively by using the method in Figure~3 as a reference but yet |
1137 extensively by using the method in Figure~3 as a reference but yet |
1138 have to work out all proof details.'' |
1138 have to work out all proof details.'' |
1139 \end{quote} |
1139 \end{quote} |
1140 |
1140 |
1141 \noindent |
1141 \noindent We would settle this correctness claim. It is relatively |
1142 We would settle this correctness claim. It is relatively straightforward |
1142 straightforward to establish that after one simplification step, the part of a |
1143 to establish that after one simplification step, the part of a nullable |
1143 nullable derivative that corresponds to a POSIX value remains intact and can |
1144 derivative that corresponds to a POSIX value remains intact and can |
1144 still be collected, in other words, we can show that\comment{Double-check....I |
1145 still be collected, in other words, we can show that\comment{Double-check....I think this is not the case} |
1145 think this is not the case} |
1146 |
1146 |
1147 \begin{center} |
1147 \begin{center} |
1148 $\textit{bmkeps} \; r = \textit{bmkeps} \; \textit{simp} \; r\;( r\; \textit{nullable})$ |
1148 $\textit{bmkeps} \; r = \textit{bmkeps} \; \textit{simp} \; r\;( r\; \textit{nullable})$ |
1149 \end{center} |
1149 \end{center} |
1150 |
1150 |
1152 as this basically comes down to proving actions like removing the |
1152 as this basically comes down to proving actions like removing the |
1153 additional $r$ in $r+r$ does not delete important POSIX information in |
1153 additional $r$ in $r+r$ does not delete important POSIX information in |
1154 a regular expression. The hard part of this proof is to establish that |
1154 a regular expression. The hard part of this proof is to establish that |
1155 |
1155 |
1156 \begin{center} |
1156 \begin{center} |
1157 $\textit{bmkeps} \; \textit{blexer}\_{simp} \; r = \textit{bmkeps} \; \textit{blexer} \; \textit{simp} \; r$ |
1157 $\textit{bmkeps} \; \textit{blexer}\_{simp}(s, \; r) = \textit{bmkeps} \; \textit{blexer} \; \textit{simp}(s, \; r)$ |
1158 \end{center} |
1158 \end{center} |
1159 |
1159 |
1160 \noindent\comment{OK from here on you still need to work. Did not read.} |
1160 \noindent That is, if we do derivative on regular expression $r$ and then |
1161 That is, if we do derivative on regular expression r and the simplified version, |
1161 simplify it, and repeat this process until we exhaust the string, we get a |
1162 they can still provide the same POSIX value if there is one . |
1162 regular expression $r''$ that provides the POSIX matching as the result $r'$ of |
1163 This is not as straightforward as the previous proposition, as the two regular expressions $r$ and $\textit{simp}\; r$ |
1163 the normal derivative algorithm which only does derivative operation repeatedly |
1164 might become very different regular expressions after repeated application of $\textit{simp}$ and derivative. |
1164 and has no simplification at all. This might seem at first glance very |
1165 The crucial point is to find the indispensable information of |
1165 unintuitive, as the $r'$ is exponentially larger than $r''$. But this can be |
1166 a regular expression and how it is kept intact during simplification so that it performs |
1166 explained in the following way: we are pruning away the possible matches that |
1167 as good as a regular expression that has not been simplified in the subsequent derivative operations. |
1167 are not POSIX. Since there are exponentially non-POSIX matchings and only 1 |
1168 To aid this, we use the helping function retrieve described by Sulzmann and Lu: |
1168 POSIX matching, it is understandable that our $r''$ can be a lot smaller. we |
1169 \\definition of retrieve\\ |
1169 can still provide the same POSIX value if there is one. This is not as |
|
1170 straightforward as the previous proposition, as the two regular expressions $r$ |
|
1171 and $\textit{simp}\; r$ might become very different regular expressions after |
|
1172 repeated application of $\textit{simp}$ and derivative. The crucial point is |
|
1173 to find the indispensable information of a regular expression and how it is |
|
1174 kept intact during simplification so that it performs as good as a regular |
|
1175 expression that has not been simplified in the subsequent derivative |
|
1176 operations. To aid this, we use the helping function retrieve described by |
|
1177 Sulzmann and Lu: \\definition of retrieve\\ |
1170 |
1178 |
1171 This function assembled the bitcode that corresponds to a parse tree for |
1179 This function assembled the bitcode that corresponds to a parse tree for |
1172 how the current derivative matches the suffix of the string(the |
1180 how the current derivative matches the suffix of the string(the |
1173 characters that have not yet appeared, but is stored in the value). |
1181 characters that have not yet appeared, but is stored in the value). |
1174 Sulzmann and Lu used this to connect the bitcoded algorithm to the older |
1182 Sulzmann and Lu used this to connect the bitcoded algorithm to the older |
1175 algorithm by the following equation: |
1183 algorithm by the following equation: |
1176 |
1184 |
1177 \begin{center} |
1185 \begin{center} $inj \;a\; c \; v = \textit{decode} \; (\textit{retrieve}\; |
1178 $inj \;a\; c \; v = \textit{decode} \; (\textit{retrieve}\; ((\textit{internalise}\; r)\backslash_{simp} c) v)$ |
1186 ((\textit{internalise}\; r)\backslash_{simp} c) v)$ \end{center} A |
1179 \end{center} |
1187 little fact that needs to be stated to help comprehension: |
1180 A little fact that needs to be stated to help comprehension: |
1188 \begin{center} $r^\uparrow = a$($a$ stands for $\textit{annotated}).$ |
1181 \begin{center} |
1189 \end{center} Ausaf and Urban also used this fact to prove the |
1182 $r^\uparrow = a$($a$ stands for $\textit{annotated}).$ |
1190 correctness of bitcoded algorithm without simplification. Our purpose |
1183 \end{center} |
1191 of using this, however, is try to establish \\ $ \textit{retrieve} \; |
1184 Ausaf and Urban also used this fact to prove the correctness of bitcoded algorithm without simplification. |
1192 a \; v \;=\; \textit{retrieve} \; \textit{simp}(a) \; v'.$\\ The idea |
1185 Our purpose of using this, however, is try to establish \\ |
1193 is that using $v'$, a simplified version of $v$ that possibly had gone |
1186 $ \textit{retrieve} \; a \; v \;=\; \textit{retrieve} \; \textit{simp}(a) \; v'.$\\ |
1194 through the same simplification step as $\textit{simp}(a)$ we are |
1187 The idea is that using $v'$, |
1195 still able to extract the bit-sequence that gives the same parsing |
1188 a simplified version of $v$ that possibly had gone through the same simplification step as $\textit{simp}(a)$ we are still able to extract the bit-sequence that gives the same parsing information as the unsimplified one. |
1196 information as the unsimplified one. After establishing this, we |
1189 After establishing this, we might be able to finally bridge the gap of proving\\ |
1197 might be able to finally bridge the gap of proving\\ |
1190 $\textit{retrieve} \; r \backslash s \; v = \;\textit{retrieve} \; \textit{simp}(r) \backslash s \; v'$\\ |
1198 $\textit{retrieve} \; r \backslash s \; v = \;\textit{retrieve} \; |
1191 and subsequently\\ |
1199 \textit{simp}(r) \backslash s \; v'$\\ and subsequently\\ |
1192 $\textit{retrieve} \; r \backslash s \; v\; = \; \textit{retrieve} \; r \backslash_{simp} s \; v'$.\\ |
1200 $\textit{retrieve} \; r \backslash s \; v\; = \; \textit{retrieve} \; |
1193 This proves that our simplified version of regular expression still contains all the bitcodes needed. |
1201 r \backslash_{simp} s \; v'$.\\ This proves that our simplified |
|
1202 version of regular expression still contains all the bitcodes needed. |
1194 |
1203 |
1195 |
1204 |
1196 The second task is to speed up the more aggressive simplification. |
1205 The second task is to speed up the more aggressive simplification. |
1197 Currently it is slower than a naive simplification(the naive version as |
1206 Currently it is slower than a naive simplification(the naive version as |
1198 implemented in ADU of course can explode in some cases). So it needs to |
1207 implemented in ADU of course can explode in some cases). So it needs to |