1204 %the current derivative matches the suffix of the string(the characters that |
1204 %the current derivative matches the suffix of the string(the characters that |
1205 %have not yet appeared, but will appear as the successive derivatives go on. |
1205 %have not yet appeared, but will appear as the successive derivatives go on. |
1206 %How do we get this "future" information? By the value $v$, which is |
1206 %How do we get this "future" information? By the value $v$, which is |
1207 %computed by a pass of the algorithm that uses |
1207 %computed by a pass of the algorithm that uses |
1208 %$inj$ as described in the previous section). |
1208 %$inj$ as described in the previous section). |
1209 using information from both the derivative regular expression and the value. |
1209 using information from both the derivative regular expression and the |
1210 Sulzmann and Lu used this |
1210 value. Sulzmann and Lu poroposed this function, but did not prove |
1211 to connect the bitcoded algorithm to the older algorithm by the following |
1211 anything about it. Ausaf and Urban used it to connect the bitcoded |
1212 equation: |
1212 algorithm to the older algorithm by the following equation: |
1213 |
1213 |
1214 \begin{center} $inj \;a\; c \; v = \textit{decode} \; (\textit{retrieve}\; |
1214 \begin{center} $inj \;a\; c \; v = \textit{decode} \; (\textit{retrieve}\; |
1215 ((\textit{internalise}\; r)\backslash_{simp} c) v)$ |
1215 (r^\uparrow)\backslash_{simp} \,c)\,v)$ |
1216 \end{center} |
1216 \end{center} |
1217 A little fact that needs to be stated to aid comprehension: |
1217 |
1218 \begin{center} |
1218 \noindent |
1219 $r^\uparrow = a$($a$ stands for $\textit{annotated}).$ |
1219 whereby $r^\uparrow$ stands for the internalised version of $r$. Ausaf |
1220 \end{center} |
1220 and Urban also used this fact to prove the correctness of bitcoded |
1221 Ausaf and Urban also used this fact to prove the |
1221 algorithm without simplification. Our purpose of using this, however, |
1222 correctness of bitcoded algorithm without simplification. Our purpose |
1222 is to establish |
1223 of using this, however, is to establish |
1223 |
1224 \begin{center} |
1224 \begin{center} |
1225 $ \textit{retrieve} \; |
1225 $ \textit{retrieve} \; |
1226 a \; v \;=\; \textit{retrieve} \; \textit{simp}(a) \; v'.$ |
1226 a \; v \;=\; \textit{retrieve} \; (\textit{simp}\,a) \; v'.$ |
1227 \end{center} |
1227 \end{center} |
1228 The idea |
1228 The idea is that using $v'$, a simplified version of $v$ that had gone |
1229 is that using $v'$, a simplified version of $v$ that had gone |
1229 through the same simplification step as $\textit{simp}(a)$, we are able |
1230 through the same simplification step as $\textit{simp}(a)$, we are |
1230 to extract the bitcode that gives the same parsing information as the |
1231 able to extract the bit-sequence that gives the same parsing |
1231 unsimplified one. However, we noticed that constructing such a $v'$ |
1232 information as the unsimplified one. After establishing this, we |
1232 from $v$ is not so straightforward. The point of this is that we might |
1233 might be able to finally bridge the gap of proving |
1233 be able to finally bridge the gap by proving |
1234 \begin{center} |
1234 |
1235 $\textit{retrieve} \; r^\uparrow \backslash s \; v = \;\textit{retrieve} \; |
1235 \begin{center} |
1236 \textit{bsimp}(r^\uparrow) \backslash s \; v'$ |
1236 $\textit{retrieve} \; (r^\uparrow \backslash s) \; v = \;\textit{retrieve} \; |
1237 \end{center} |
1237 \textit{simp}(r^\uparrow) \backslash s \; v'$ |
|
1238 \end{center} |
|
1239 |
|
1240 \noindent |
1238 and subsequently |
1241 and subsequently |
1239 \begin{center} |
1242 |
1240 $\textit{retrieve} \; r^\uparrow \backslash s \; v\; = \; \textit{retrieve} \; |
1243 \begin{center} |
1241 r^\uparrow \backslash_{simp} s \; v'$. |
1244 $\textit{retrieve} \; (r^\uparrow \backslash s) \; v\; = \; \textit{retrieve} \; |
1242 \end{center} |
1245 (r^\uparrow \backslash_{simp} \, s) \; v'$. |
1243 The $\textit{LHS}$ of the above equation is the bitcode we want. |
1246 \end{center} |
1244 This proves that our simplified |
1247 |
1245 version of regular expression still contains all the bitcodes needed. |
1248 \noindent |
1246 The task here is to find a way to compute the correct $v'$. |
1249 The $\textit{LHS}$ of the above equation is the bitcode we want. This |
|
1250 would prove that our simplified version of regular expression still |
|
1251 contains all the bitcodes needed. The task here is to find a way to |
|
1252 compute the correct $v'$. |
1247 |
1253 |
1248 The second task is to speed up the more aggressive simplification. |
1254 The second task is to speed up the more aggressive simplification. |
1249 Currently it is slower than a naive simplification(the naive version as |
1255 Currently it is slower than the original naive simplification by Ausaf |
1250 implemented in ADU of course can explode in some cases). So it needs to |
1256 and Urban (the naive version as implemented by Ausaf and Urban of |
1251 be explored how to make it faster. Our possibility would be to explore |
1257 course can ``explode'' in some cases). So it needs to be explored how to |
1252 again the connection to DFAs. This is very much work in progress. |
1258 make our algorithm faster on all inputs. Our possibility would be to |
|
1259 explore again the connection to DFAs. This is very much work in |
|
1260 progress. |
1253 |
1261 |
1254 \section{Conclusion} |
1262 \section{Conclusion} |
1255 |
1263 |
1256 In this PhD-project we are interested in fast algorithms for regular |
1264 In this PhD-project we are interested in fast algorithms for regular |
1257 expression matching. While this seems to be a ``settled'' area, in |
1265 expression matching. While this seems to be a ``settled'' area, in |