1206 $\fuse$s the bit(s) $_0$ to the front of $_0\ONE $ and $_{11}a^*$. |
1206 $\fuse$s the bit(s) $_0$ to the front of $_0\ONE $ and $_{11}a^*$. |
1207 In a nutshell, the order of simplification causes |
1207 In a nutshell, the order of simplification causes |
1208 the bits to be moved differently. |
1208 the bits to be moved differently. |
1209 |
1209 |
1210 \subsubsection{A Failed Attempt To Remedy the Problem Above} |
1210 \subsubsection{A Failed Attempt To Remedy the Problem Above} |
1211 A simple class of regular expressions and strings |
1211 A simple class of regular expression and string |
1212 pairs can be deduced |
1212 pairs $(r, s)$ can be deduced from the above example |
1213 from the above example which |
1213 which trigger the difference between |
1214 trigger the difference between |
|
1215 $\rup\backslash_{simp} \, s$ |
1214 $\rup\backslash_{simp} \, s$ |
1216 and $\simp(\rup\backslash s)$: |
1215 and $\simp(\rup\backslash s)$: |
1217 \[ |
1216 \begin{center} |
1218 D = (c_1c_2, \{r_1\cdot r_2 \mid \text{$\simp(r_2) = r_2, \simp(r_1 \backslash c_1) = \ONE, r_1 \; not \; \nullable, c_2 \in L(r_2), |
1217 \begin{tabular}{lcl} |
1219 \simp(r_2 \backslash c_2)$ is of shape alternative and $c_1c_2 \notin L(r_1)$}\}) |
1218 $D =\{ (r_1 \cdot r_2,\; [c_1c_2]) \mid $ & $\simp(r_2) = r_2, \simp(r_1 \backslash c_1) = \ONE,$\\ |
1220 \] |
1219 $r_1 \; not \; \nullable, c_2 \in L(r_2),$ & $\exists \textit{rs},\textit{bs}.\; r_2 \backslash c_2 = _{bs}{\oplus rs}$\\ |
1221 The culprit is the different order in which simplification is applied. |
1220 $\exists \textit{rs}_1. \; \simp(r_2 \backslash c_2) = _{bs}{\oplus \textit{rs}_1}$ & $and \;\simp(r_1 \backslash [c_1c_2]) = \ZERO\}$\\ |
1222 For $\rup \backslash_{simp} s$, |
1221 \end{tabular} |
|
1222 \end{center} |
|
1223 We take a pair $(r, \;s)$ from the set $D$. |
|
1224 |
|
1225 Now we compute ${\bf \rup \backslash_{simp} s}$, we get: |
1223 \begin{center} |
1226 \begin{center} |
1224 \begin{tabular}{lcl} |
1227 \begin{tabular}{lcl} |
1225 $(r_1\cdot r_2)\backslash_{simp} \, [c_1c_2]$ & $= \simp\left[ \big(\simp\left[ \left( r_1\cdot r_2 \right) \backslash c_1\right] \big)\backslash c_2\right]$\\ |
1228 $(r_1\cdot r_2)\backslash_{simp} \, [c_1c_2]$ & $= \simp\left[ \big(\simp\left[ \left( r_1\cdot r_2 \right) \backslash c_1\right] \big)\backslash c_2\right]$\\ |
1226 & $= \simp\left[ \big(\simp \left[ \left(r_1 \backslash c_1\right) \cdot r_2 \right] \big) \backslash c_2 \right]$\\ |
1229 & $= \simp\left[ \big(\simp \left[ \left(r_1 \backslash c_1\right) \cdot r_2 \right] \big) \backslash c_2 \right]$\\ |
1227 & $= \simp \left[ (\fuse \; \bmkeps(r_1\backslash c_1) \; \simp(r_2) ) \backslash c_2 \right]$, |
1230 & $= \simp \left[ (\fuse \; \bmkeps(r_1\backslash c_1) \; \simp(r_2) ) \backslash c_2 \right]$,\\ |
|
1231 & $= \simp \left[ (\fuse \; \bmkeps(r_1\backslash c_1) \; r_2 ) \backslash c_2 \right]$, |
1228 \end{tabular} |
1232 \end{tabular} |
1229 \end{center} |
1233 \end{center} |
1230 \noindent |
1234 \noindent |
1231 from the definition of $D$ we know $r_1 \backslash c_1$ is nullable, therefore |
1235 from the definition of $D$ we know $r_1 \backslash c_1$ is nullable, therefore |
1232 $\bmkeps(r_1\backslash c_1)$ returns a bitcode, we shall call it |
1236 $\bmkeps(r_1\backslash c_1)$ returns a bitcode, we shall call it |
1233 $bs$. Also, $\simp(r_2 \backslash c_2)$ is a certain alternative, |
1237 $\textit{bs}_2$. |
1234 we shall call it $_{bs_1}\oplus rs$, and $( \left(r_1 \backslash c_1\right) \backslash c_2 \cdot r_2)$ |
1238 The above term can be rewritten as |
1235 simplifies to $\ZERO$, |
1239 \begin{center} |
1236 so the above term can be rewritten as |
1240 $ \simp \left[ \fuse \; \textit{bs}_2\; r_2 \backslash c_2 \right]$, |
1237 \begin{center} |
1241 \end{center} |
1238 \begin{tabular}{lcl} |
1242 which is equal to |
1239 $\textit{distinct}(\flatten[\ZERO\;, \; _{\textit{bs++bs1}}\oplus rs] ) \; \textit{match} $ \\ |
1243 \begin{center} |
|
1244 $\simp \left[ \fuse \; \textit{bs}_2 \; _{bs}{\oplus rs} \right]$\\ |
|
1245 $=\simp \left[ \; _{bs_2++bs}{\oplus rs} \right]$\\ |
|
1246 $= \; _{bs_2++bs}{\oplus \textit{rs}_1} $ |
|
1247 \end{center} |
|
1248 \noindent |
|
1249 by using the properties from the set $D$ again |
|
1250 and again(The reason why we set so many conditions |
|
1251 that the pair $(r,s)$ need to satisfy is because we can |
|
1252 rewrite them easily to construct the difference.) |
|
1253 |
|
1254 Now we compute ${\bf \simp(\rup \backslash s)}$: |
|
1255 \begin{center} |
|
1256 $\simp \big[(r_1\cdot r_2) \backslash [c_1c_2] \big]= \simp \left[ ((r_1 \cdot r_2 )\backslash c_1) \backslash c_2 \right]$ |
|
1257 \end{center} |
|
1258 \noindent |
|
1259 Again, using the properties above, we obtain |
|
1260 the following chain of equalities: |
|
1261 \begin{center} |
|
1262 $\simp(\rup \backslash s)= \simp \left[ ((r_1 \cdot r_2 )\backslash c_1) \backslash c_2 \right]= \simp\left[ \left(r_1 \backslash c_1\right) \cdot r_2 \big) \backslash c_2 \right]$\\ |
|
1263 $= \simp \left[ \oplus[\big( \left(r_1 \backslash c_1\right) \backslash c_2 \big) \cdot r_2 \; , \; \fuse \; (\bmkeps \;r_1\backslash c_1) \; r_2 \backslash c_2 ] \right]$, |
|
1264 \end{center} |
|
1265 \noindent |
|
1266 as before, we call the bitcode returned by |
|
1267 $\bmkeps(r_1\backslash c_1)$ as |
|
1268 $\textit{bs}_2$. |
|
1269 Also, $\simp(r_2 \backslash c_2)$ is |
|
1270 $_{bs}\oplus \textit{rs}_1$, |
|
1271 and $( \left(r_1 \backslash c_1\right) \backslash c_2 \cdot r_2)$ |
|
1272 simplifies to $\ZERO$, |
|
1273 so the above term can be expanded as |
|
1274 \begin{center} |
|
1275 \begin{tabular}{l} |
|
1276 $\textit{distinct}(\flatten[\ZERO\;, \; _{\textit{bs}_2++\textit{bs}}\oplus \textit{rs}_1] ) \; \textit{match} $ \\ |
1240 $\textit{case} \; [] \Rightarrow \ZERO$ \\ |
1277 $\textit{case} \; [] \Rightarrow \ZERO$ \\ |
1241 $\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$ \\ |
1278 $\textit{case} \; a :: [] \Rightarrow \textit{\fuse \; \textit{bs} a}$ \\ |
1242 $\textit{case} \; as' \Rightarrow \textit{ALTS}\;bs\;as'$\\ |
1279 $\textit{case} \; as' \Rightarrow _{[]}\oplus as'$\\ |
1243 \end{tabular} |
1280 \end{tabular} |
1244 \end{center} |
1281 \end{center} |
1245 this, in turn, can be rewritten as $map (\fuse \textit{bs}++\textit{bs1}) rs$ based on |
1282 \noindent |
1246 the properties of the function $\simp$(more on this later). |
1283 Applying the definition of $\flatten$, we get |
1247 simplification is done along the way, disallowing the structure of nested alternatives, |
1284 \begin{center} |
1248 and this can actually be proven: simp(r) does not contain |
1285 $_{[]}\oplus (\textit{map} \; \fuse (\textit{bs}_2 ++ bs) rs_1)$ |
1249 nested alternatives. |
1286 \end{center} |
1250 \[ |
1287 \noindent |
1251 \simp \left[(r_1\cdot r_2) \backslash [c_1c_2] \right]= \simp \left[ ((r_1 \cdot r_2 )\backslash c_1) \backslash c_2 \right] |
1288 compared to the result |
1252 \] |
1289 \begin{center} |
1253 For $\simp(\rup \backslash s)$, |
1290 $ \; _{bs_2++bs}{\oplus \textit{rs}_1} $ |
1254 \begin{center} |
1291 \end{center} |
1255 \begin{tabular}{lcl} |
1292 \noindent |
1256 $\simp\big[(r_1\cdot r_2)\backslash \, [c_1c_2]\big]$ & $= \simp\left[ \big( \left( r_1\cdot r_2 \right) \backslash c_1 \big)\backslash c_2\right]$\\ |
1293 Note how these two regular expressions only |
1257 & $= \simp\left[ \left(r_1 \backslash c_1\right) \cdot r_2 \big) \backslash c_2 \right]$\\ |
1294 differ in terms of the position of the bits |
1258 & $= \simp \left[ \oplus[\big( \left(r_1 \backslash c_1\right) \backslash c_2 \big) \cdot r_2 \; , \; \fuse \; \bmkeps(r_1\backslash c_1) \; r_2 \backslash c_2 ] \right]$, |
1295 $\textit{bs}_2++\textit{bs}$. They are the same otherwise. |
1259 \end{tabular} |
1296 What caused this difference to happen? |
1260 \end{center} |
1297 The culprit is the $\flatten$ function, which spills |
1261 \noindent |
1298 out the bitcodes in the inner alternatives when |
1262 as before, we call the bitcode returned by $\bmkeps(r_1\backslash c_1)$ |
1299 there exists an outer alternative. |
1263 $bs$. Also, $\simp(r_2 \backslash c_2)$ is a certain alternative, |
1300 Note how the absence of simplification |
1264 we shall call it $_{bs_1}\oplus rs$, and $( \left(r_1 \backslash c_1\right) \backslash c_2 \cdot r_2)$ |
1301 caused $\simp(\rup \backslash s)$ to |
1265 simplifies to $\ZERO$, |
1302 generate the nested alternatives structure: |
1266 so the above term can be rewritten as |
1303 \begin{center} |
1267 \begin{center} |
1304 $ \oplus[\ZERO \;, \; _{bs}\oplus \textit{rs} ]$ |
1268 \begin{tabular}{lcl} |
1305 \end{center} |
1269 $\textit{distinct}(\flatten[\ZERO\;, \; _{\textit{bs}++\textit{bs1}}\oplus rs] ) \; \textit{match} $ \\ |
1306 and this will always trigger the $\flatten$ to |
1270 $\textit{case} \; [] \Rightarrow \ZERO$ \\ |
1307 spill out the sequence $\textit{bs}$, |
1271 $\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$ \\ |
1308 whereas when |
1272 $\textit{case} \; as' \Rightarrow \textit{ALTS}\;bs\;as'$\\ |
1309 simplification is done along the way, |
1273 \end{tabular} |
1310 the structure of nested alternatives is never created(we can |
1274 \end{center} |
1311 actually prove that simplification function never allows nested |
1275 |
1312 alternatives to happen, more on this later). |
1276 this, in turn, can be rewritten as $map (\fuse\; \textit{bs}++\textit{bs1}) rs$ based on |
|
1277 the properties of the function $\simp$(more on this later). |
|
1278 |
|
1279 simplification is done along the way, disallowing the structure of nested alternatives, |
|
1280 and this can actually be proven: simp(r) does not contain |
|
1281 nested alternatives. |
|
1282 |
1313 |
1283 %CONSTRUCTION SITE HERE |
1314 %CONSTRUCTION SITE HERE |
1284 that is to say, despite the bits are being moved around on the regular expression |
1315 that is to say, despite the bits are being moved around on the regular expression |
1285 (difference in bits), the structure of the (unannotated)regular expression |
1316 (difference in bits), the structure of the (unannotated)regular expression |
1286 after one simplification is exactly the same after the |
1317 after one simplification is exactly the same after the |