346 make them compatible with the $\textit{inj}$-mechanism, we use |
350 make them compatible with the $\textit{inj}$-mechanism, we use |
347 \emph{bitcodes}. They were first introduced by Sulzmann and Lu. |
351 \emph{bitcodes}. They were first introduced by Sulzmann and Lu. |
348 Here bits and bitcodes (lists of bits) are defined as: |
352 Here bits and bitcodes (lists of bits) are defined as: |
349 |
353 |
350 \begin{center} |
354 \begin{center} |
351 $b ::= S \mid Z \qquad |
355 $b ::= 1 \mid 0 \qquad |
352 bs ::= [] \mid b:bs |
356 bs ::= [] \mid b:bs |
353 $ |
357 $ |
354 \end{center} |
358 \end{center} |
355 |
359 |
356 \noindent |
360 \noindent |
357 The $S$ and $Z$ are arbitrary names for the bits in order to avoid |
361 The $1$ and $0$ are not in bold in order to avoid |
358 confusion with the regular expressions $\ZERO$ and $\ONE$. Bitcodes (or |
362 confusion with the regular expressions $\ZERO$ and $\ONE$. Bitcodes (or |
359 bit-lists) can be used to encode values (or incomplete values) in a |
363 bit-lists) can be used to encode values (or incomplete values) in a |
360 compact form. This can be straightforwardly seen in the following |
364 compact form. This can be straightforwardly seen in the following |
361 coding function from values to bitcodes: |
365 coding function from values to bitcodes: |
362 |
366 |
363 \begin{center} |
367 \begin{center} |
364 \begin{tabular}{lcl} |
368 \begin{tabular}{lcl} |
365 $\textit{code}(\Empty)$ & $\dn$ & $[]$\\ |
369 $\textit{code}(\Empty)$ & $\dn$ & $[]$\\ |
366 $\textit{code}(\Char\,c)$ & $\dn$ & $[]$\\ |
370 $\textit{code}(\Char\,c)$ & $\dn$ & $[]$\\ |
367 $\textit{code}(\Left\,v)$ & $\dn$ & $\Z :: code(v)$\\ |
371 $\textit{code}(\Left\,v)$ & $\dn$ & $0 :: code(v)$\\ |
368 $\textit{code}(\Right\,v)$ & $\dn$ & $\S :: code(v)$\\ |
372 $\textit{code}(\Right\,v)$ & $\dn$ & $1 :: code(v)$\\ |
369 $\textit{code}(\Seq\,v_1\,v_2)$ & $\dn$ & $code(v_1) \,@\, code(v_2)$\\ |
373 $\textit{code}(\Seq\,v_1\,v_2)$ & $\dn$ & $code(v_1) \,@\, code(v_2)$\\ |
370 $\textit{code}(\Stars\,[])$ & $\dn$ & $[\Z]$\\ |
374 $\textit{code}(\Stars\,[])$ & $\dn$ & $[0]$\\ |
371 $\textit{code}(\Stars\,(v\!::\!vs))$ & $\dn$ & $\S :: code(v) \;@\; |
375 $\textit{code}(\Stars\,(v\!::\!vs))$ & $\dn$ & $1 :: code(v) \;@\; |
372 code(\Stars\,vs)$ |
376 code(\Stars\,vs)$ |
373 \end{tabular} |
377 \end{tabular} |
374 \end{center} |
378 \end{center} |
375 |
379 |
376 \noindent |
380 \noindent |
377 Here $\textit{code}$ encodes a value into a bitcodes by converting |
381 Here $\textit{code}$ encodes a value into a bitcodes by converting |
378 $\Left$ into $\Z$, $\Right$ into $\S$, the start point of a non-empty |
382 $\Left$ into $0$, $\Right$ into $1$, and marks the start of a non-empty |
379 star iteration into $\S$, and the border where a local star terminates |
383 star iteration by $1$. The border where a local star terminates |
380 into $\Z$. This coding is lossy, as it throws away the information about |
384 is marked by $0$. This coding is lossy, as it throws away the information about |
381 characters, and also does not encode the ``boundary'' between two |
385 characters, and also does not encode the ``boundary'' between two |
382 sequence values. Moreover, with only the bitcode we cannot even tell |
386 sequence values. Moreover, with only the bitcode we cannot even tell |
383 whether the $\S$s and $\Z$s are for $\Left/\Right$ or $\Stars$. The |
387 whether the $1$s and $0$s are for $\Left/\Right$ or $\Stars$. The |
384 reason for choosing this compact way of storing information is that the |
388 reason for choosing this compact way of storing information is that the |
385 relatively small size of bits can be easily manipulated and ``moved |
389 relatively small size of bits can be easily manipulated and ``moved |
386 around'' in a regular expression. In order to recover values, we will |
390 around'' in a regular expression. In order to recover values, we will |
387 need the corresponding regular expression as an extra information. This |
391 need the corresponding regular expression as an extra information. This |
388 means the decoding function is defined as: |
392 means the decoding function is defined as: |
1325 This idea can be realized by making the following |
1329 This idea can be realized by making the following |
1326 changes to the $\simp$-function: |
1330 changes to the $\simp$-function: |
1327 \begin{center} |
1331 \begin{center} |
1328 \begin{tabular}{@{}lcl@{}} |
1332 \begin{tabular}{@{}lcl@{}} |
1329 |
1333 |
1330 $\textit{simp} \; (_{\textit{bs}}(a_1 \cdot a_2))$ & $\dn$ & $ (\textit{simp} \; a_1, \textit{simp} \; a_2) \; \textit{match} $ \\ |
1334 $\textit{simp}' \; (_{\textit{bs}}(a_1 \cdot a_2))$ & $\dn$ & $\textit{as} \; \simp \; \textit{was} \; \textit{before} $ \\ |
1331 &&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow \ZERO$ \\ |
1335 |
1332 &&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow \ZERO$ \\ |
1336 $\textit{simp}' \; (_{bs}\oplus as)$ & $\dn$ & \st{$\textit{distinct}( \textit{flatten} ( \textit{map simp as})) \; \textit{match} $} \\ |
1333 &&$\quad\textit{case} \; (\ONE, a_2') \Rightarrow \textit{fuse} \; bs \; a_2'$ \\ |
1337 &&\st{$\quad\textit{case} \; [] \Rightarrow \ZERO$} \\ |
1334 &&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow \textit{fuse} \; bs \; a_1'$ \\ |
1338 &&\st{$\quad\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$} \\ |
1335 &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow _{\textit{bs}}(a_1' \cdot a_2')$ \\ |
1339 &&\st{$\quad\textit{case} \; as' \Rightarrow \textit{ALTS}\;bs\;as'$}\\ |
1336 |
1340 &&$\textit{if}(\textit{hollowAlternatives}( \textit{map \; simp \; as}))$\\ |
1337 $\textit{simp} \; (bs_\oplus as)$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map simp as})) \; \textit{match} $ \\ |
1341 &&$\textit{then} \; \fuse \; \textit{bs}\; \textit{extractAlt}(\textit{map} \; \simp \; \textit{as} )$\\ |
1338 &&$\quad\textit{case} \; [] \Rightarrow \ZERO$ \\ |
1342 &&$\textit{else} \; \simp(_{bs} \oplus \textit{as})$\\ |
1339 &&$\quad\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$ \\ |
1343 |
1340 &&$\quad\textit{case} \; as' \Rightarrow \textit{ALTS}\;bs\;as'$\\ |
1344 |
1341 |
1345 $\textit{simp}' \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$ |
1342 $\textit{simp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$ |
|
1343 \end{tabular} |
1346 \end{tabular} |
1344 \end{center} |
1347 \end{center} |
1345 |
1348 |
1346 \noindent |
1349 \noindent |
1347 |
1350 given the definition of $\textit{hollowAlternatives}$ and $\textit{extractAlt}$ : |
|
1351 \begin{center} |
|
1352 $\textit{hollowAlternatives}( \textit{rs}) \dn |
|
1353 \exists r = (_{\textit{bs}_1}\oplus \textit{rs}_1) \in \textit{rs}. \forall r' \in \textit{rs}, \; |
|
1354 \textit{either} \; r' = \ZERO \; \textit{or} \; r' = r $ |
|
1355 $\textit{extractAlt}( \textit{rs}) \dn \textit{if}\big( |
|
1356 \exists r = (_{\textit{bs}_1}\oplus \textit{rs}_1) \in \textit{rs}. \forall r' \in \textit{rs}, \; |
|
1357 \textit{either} \; r' = \ZERO \; \textit{or} \; r' = r \big)\; \textit{then} \; \textit{return} \; r$ |
|
1358 \end{center} |
|
1359 \noindent |
|
1360 Basically, $\textit{hollowAlternatives}$ captures the feature of |
|
1361 a list of regular expression of the shape |
|
1362 \begin{center} |
|
1363 $ \oplus[\ZERO \;, \; _{bs}\oplus \textit{rs} ]$ |
|
1364 \end{center} |
|
1365 and this means we can simply elevate the |
|
1366 inner regular expression $_{bs}\oplus \textit{rs}$ |
|
1367 to the outmost |
|
1368 and throw away the useless $\ZERO$s and |
|
1369 the outer $\oplus$ wrapper. |
|
1370 Using this new definition of simp, |
|
1371 under the example where $r$ is the regular expression |
|
1372 $(a+b)(a+a*)$ and $s$ is the string $aa$ |
|
1373 the problem of $\rup\backslash_{simp} \, s \neq \simp(\rup\backslash s)$ |
|
1374 is resolved. |
|
1375 |
|
1376 Unfortunately this causes new problems: |
|
1377 for the counterexample where |
|
1378 \begin{center} |
|
1379 $r$ is the regular expression |
|
1380 $(ab+(a^*+aa))$ and $s$ is the string $aa$ |
|
1381 \end{center} |
|
1382 $\rup\backslash_{simp} \, s \neq \simp(\rup\backslash s)$ |
|
1383 happens again, whereas this does not happen for the old |
|
1384 version of $\simp$. |
|
1385 This dilemma causes this amendment not a successful |
|
1386 attempt to make $\rup\backslash_{simp} \, s = \simp(\rup\backslash s)$ |
|
1387 under every possible regular expression and string. |
1348 \subsection{Properties of the Function $\simp$} |
1388 \subsection{Properties of the Function $\simp$} |
1349 |
1389 |
1350 We have proved in Isabelle quite a few properties |
1390 We have proved in Isabelle quite a few properties |
1351 of the $\simp$-function, which helps the proof to go forward |
1391 of the $\simp$-function, which helps the proof to go forward |
1352 and we list them here to aid comprehension. |
1392 and we list them here to aid comprehension. |
1376 \item |
1416 \item |
1377 \begin{center} |
1417 \begin{center} |
1378 $\textit{if} r = \simp(r') \textit{then} \; \textit{good}(r) $ |
1418 $\textit{if} r = \simp(r') \textit{then} \; \textit{good}(r) $ |
1379 \end{center} |
1419 \end{center} |
1380 \end{itemize} |
1420 \end{itemize} |
1381 |
1421 \subsection{the Contains relation} |
|
1422 $\retrieve$ is a too strong relation in that |
|
1423 it only extracts one bitcode instead of a set of them. |
|
1424 Therefore we try to define another relation(predicate) |
|
1425 to capture the fact the regular expression has bits |
|
1426 being moved around but still has all the bits needed. |
|
1427 The contains symbol, written$\gg$, is a relation that |
|
1428 takes two arguments in an infix form |
|
1429 and returns a truth value. |
|
1430 |
|
1431 |
|
1432 In other words, from the set of regular expression and |
|
1433 bitcode pairs |
|
1434 $\textit{RV} = \{(r, v) \mid r \text{r is a regular expression, v is a value}\}$, |
|
1435 those that satisfy the following requirements are in the set |
|
1436 $\textit{RV}_Contains$. |
|
1437 Unlike the $\retrieve$ |
|
1438 function, which takes two arguments $r$ and $v$ and |
|
1439 produces an only answer $\textit{bs}$, it takes only |
|
1440 one argument $r$ and returns a set of bitcodes that |
|
1441 can be generated by $r$. |
|
1442 \begin{center} |
|
1443 \begin{tabular}{llclll} |
|
1444 & & & $_{bs}\ONE$ & $\gg$ & $\textit{bs}$\\ |
|
1445 & & & $_{bs}{\bf c}$ & $\gg$ & $\textit{bs}$\\ |
|
1446 $\textit{if} \; r_1 \gg \textit{bs}_1$ & $r_2 \; \gg \textit{bs}_2$ |
|
1447 & $\textit{then}$ & |
|
1448 $_{bs}{r_1 \cdot r_2}$ & |
|
1449 $\gg$ & |
|
1450 $\textit{bs} @ \textit{bs}_1 @ \textit{bs}_2$\\ |
|
1451 |
|
1452 $\textit{if} \; r \gg \textit{bs}_1$ & & $\textit{then}$ & |
|
1453 $_{bs}{\oplus(r :: \textit{rs}})$ & |
|
1454 $\gg$ & |
|
1455 $\textit{bs} @ \textit{bs}_1 $\\ |
|
1456 |
|
1457 $\textit{if} \; _{bs}(\oplus \textit{rs}) \gg \textit{bs} @ \textit{bs}_1$ & & $\textit{then}$ & |
|
1458 $_{bs}{\oplus(r :: \textit{rs}})$ & |
|
1459 $\gg$ & |
|
1460 $\textit{bs} @ \textit{bs}_1 $\\ |
|
1461 |
|
1462 $\textit{if} \; r \gg \textit{bs}_1\; \textit{and}$ & $_{bs}r^* \gg \textit{bs} @ \textit{bs}_2$ & $\textit{then}$ & |
|
1463 $_{bs}r^* $ & |
|
1464 $\gg$ & |
|
1465 $\textit{bs} @ [0] @ \textit{bs}_1@ \textit{bs}_2 $\\ |
|
1466 |
|
1467 & & & $_{bs}r^*$ & $\gg$ & $\textit{bs} @ [1]$\\ |
|
1468 \end{tabular} |
|
1469 \end{center} |
|
1470 It is a predicate in the sense that given |
|
1471 a regular expression and a bitcode, it |
|
1472 returns true or false, depending on whether |
|
1473 or not the regular expression can actually produce that |
|
1474 value. If the predicates returns a true, then |
|
1475 we say that the regular expression $r$ contains |
|
1476 the bitcode $\textit{bs}$, written |
|
1477 $r \gg \textit{bs}$. |
|
1478 The $\gg$ operator with the |
|
1479 regular expression $r$ may also be seen as a |
|
1480 machine that does a derivative of regular expressions |
|
1481 on all strings simultaneously, taking |
|
1482 the bits by going throught the regular expression tree |
|
1483 structure in a depth first manner, regardless of whether |
|
1484 the part being traversed is nullable or not. |
|
1485 It put all possible bits that can be produced on such a traversal |
|
1486 into a set. |
|
1487 For example, if we are given the regular expression |
|
1488 $((a+b)(c+d))^*$, the tree structure may be written as |
|
1489 \begin{center} |
|
1490 \begin{tikzpicture} |
|
1491 \tikz[tree layout]\graph[nodes={draw, circle}] { |
|
1492 * -> |
|
1493 {@-> { |
|
1494 {+1 -> |
|
1495 {a , b} |
|
1496 }, |
|
1497 {+ -> |
|
1498 {c , d } |
|
1499 } |
|
1500 } |
|
1501 } |
|
1502 }; |
|
1503 \end{tikzpicture} |
|
1504 \end{center} |
|
1505 \subsection{the $\textit{ders}_2$ Function} |
|
1506 If we want to prove the result |
|
1507 \begin{center} |
|
1508 $ \textit{blexer}\_{simp}(r, \; s) = \textit{blexer}(r, \; s)$ |
|
1509 \end{center} |
|
1510 inductively |
|
1511 on the structure of the regular expression, |
|
1512 then we need to induct on the case $r_1 \cdot r_2$, |
|
1513 it can be good if we could express $(r_1 \cdot r_2) \backslash s$ |
|
1514 in terms of $r_1 \backslash s$ and $r_2 \backslash s$, |
|
1515 and this naturally induces the $ders2$ function, |
|
1516 which does a "convolution" on $r_1$ and $r_2$ using the string |
|
1517 $s$. |
|
1518 It is based on the observation that the derivative of $r_1 \cdot r_2$ |
|
1519 with respect to a string $s$ can actually be written in an "explicit form" |
|
1520 composed of $r_1$'s derivative of $s$ and $r_2$'s derivative of $s$. |
|
1521 This can be illustrated in the following procedure execution |
|
1522 \begin{center} |
|
1523 $ (r_1 \cdot r_2) \backslash [c_1c_2] = (\textit{if} \; \nullable(r_1)\; \textit{then} \; ((r_1 \backslash c_1) \cdot r_2 + r_2 \backslash c_1) \; \textit{else} \; (r_1\backslash c_1) \cdot r_2) \backslash c_2$ |
|
1524 \end{center} |
|
1525 which can also be written in a "convoluted sum" |
|
1526 format: |
|
1527 \begin{center} |
|
1528 $ (r_1 \cdot r_2) \backslash [c_1c_2] = \sum{r_1 \backslash s_i \cdot r_2 \backslash s_j}$ |
|
1529 \end{center} |
|
1530 In a more serious manner, we should write |
|
1531 \begin{center} |
|
1532 $ (r_1 \cdot r_2) \backslash [c_1c_2] = \sum{r_1 \backslash s_i \cdot r_2 \backslash s_j}$ |
|
1533 \end{center} |
|
1534 Note this differentiates from the previous form in the sense that |
|
1535 it calculates the results $r_1\backslash s_i , r_2 \backslash s_j$ first and then glue them together |
|
1536 through nested alternatives whereas the $r_1 \cdot r_2 \backslash s$ procedure, |
|
1537 used by algorithm $\lexer$, can only produce each component of the |
|
1538 resulting alternatives regular expression altogether rather than |
|
1539 generating each of the children nodes one by one |
|
1540 n a single recursive call that is only for generating that |
|
1541 very expression itself. |
|
1542 We have this |
|
1543 \section{Conclusion} |
|
1544 Under the exhaustive tests we believe the main |
|
1545 result holds, yet a proof still seems elusive. |
|
1546 We have tried out different approaches, and |
|
1547 found a lot of properties of the function $\simp$. |
|
1548 The counterexamples where $\rup\backslash_{simp} \, s \neq \simp(\rup\backslash s)$ |
|
1549 are also valuable in the sense that |
|
1550 we get to know better why they are not equal and what |
|
1551 are the subtle differences between a |
|
1552 nested simplified regular expression and a |
|
1553 regular expression that is simplified at the final moment. |
|
1554 We are almost there, but a last step is needed to make the proof work. |
|
1555 Hopefully in the next few weeks we will be able to find one. |
1382 %CONSTRUCTION SITE HERE |
1556 %CONSTRUCTION SITE HERE |
1383 that is to say, despite the bits are being moved around on the regular expression |
1557 that is to say, despite the bits are being moved around on the regular expression |
1384 (difference in bits), the structure of the (unannotated)regular expression |
1558 (difference in bits), the structure of the (unannotated)regular expression |
1385 after one simplification is exactly the same after the |
1559 after one simplification is exactly the same after the |
1386 same sequence of derivative operations |
1560 same sequence of derivative operations |
2264 %data, that a similar bound can be obtained for the derivatives in |
2438 %data, that a similar bound can be obtained for the derivatives in |
2265 %Sulzmann and Lu's algorithm. Let us give some details about this next. |
2439 %Sulzmann and Lu's algorithm. Let us give some details about this next. |
2266 |
2440 |
2267 |
2441 |
2268 \begin{center} |
2442 \begin{center} |
2269 $b ::= S \mid Z \qquad |
2443 $b ::= 0 \mid 1 \qquad |
2270 bs ::= [] \mid b:bs |
2444 bs ::= [] \mid b:bs |
2271 $ |
2445 $ |
2272 \end{center} |
2446 \end{center} |
2273 |
2447 |
2274 \noindent |
2448 \noindent |
2275 The $S$ and $Z$ are arbitrary names for the bits in order to avoid |
2449 The $0$ and $1$ are arbitrary names for the bits and not in bold |
|
2450 in order to avoid |
2276 confusion with the regular expressions $\ZERO$ and $\ONE$. Bitcodes (or |
2451 confusion with the regular expressions $\ZERO$ and $\ONE$. Bitcodes (or |
2277 bit-lists) can be used to encode values (or incomplete values) in a |
2452 bit-lists) can be used to encode values (or incomplete values) in a |
2278 compact form. This can be straightforwardly seen in the following |
2453 compact form. This can be straightforwardly seen in the following |
2279 coding function from values to bitcodes: |
2454 coding function from values to bitcodes: |
2280 |
2455 |
2281 \begin{center} |
2456 \begin{center} |
2282 \begin{tabular}{lcl} |
2457 \begin{tabular}{lcl} |
2283 $\textit{code}(\Empty)$ & $\dn$ & $[]$\\ |
2458 $\textit{code}(\Empty)$ & $\dn$ & $[]$\\ |
2284 $\textit{code}(\Char\,c)$ & $\dn$ & $[]$\\ |
2459 $\textit{code}(\Char\,c)$ & $\dn$ & $[]$\\ |
2285 $\textit{code}(\Left\,v)$ & $\dn$ & $\Z :: code(v)$\\ |
2460 $\textit{code}(\Left\,v)$ & $\dn$ & $0 :: code(v)$\\ |
2286 $\textit{code}(\Right\,v)$ & $\dn$ & $\S :: code(v)$\\ |
2461 $\textit{code}(\Right\,v)$ & $\dn$ & $1 :: code(v)$\\ |
2287 $\textit{code}(\Seq\,v_1\,v_2)$ & $\dn$ & $code(v_1) \,@\, code(v_2)$\\ |
2462 $\textit{code}(\Seq\,v_1\,v_2)$ & $\dn$ & $code(v_1) \,@\, code(v_2)$\\ |
2288 $\textit{code}(\Stars\,[])$ & $\dn$ & $[\Z]$\\ |
2463 $\textit{code}(\Stars\,[])$ & $\dn$ & $[0]$\\ |
2289 $\textit{code}(\Stars\,(v\!::\!vs))$ & $\dn$ & $\S :: code(v) \;@\; |
2464 $\textit{code}(\Stars\,(v\!::\!vs))$ & $\dn$ & $1 :: code(v) \;@\; |
2290 code(\Stars\,vs)$ |
2465 code(\Stars\,vs)$ |
2291 \end{tabular} |
2466 \end{tabular} |
2292 \end{center} |
2467 \end{center} |
2293 |
2468 |
2294 \noindent |
2469 \noindent |
2295 Here $\textit{code}$ encodes a value into a bitcodes by converting |
2470 Here $\textit{code}$ encodes a value into a bitcodes by converting |
2296 $\Left$ into $\Z$, $\Right$ into $\S$, the start point of a non-empty |
2471 $\Left$ into $0$, $\Right$ into $1$, the start point of a non-empty |
2297 star iteration into $\S$, and the border where a local star terminates |
2472 star iteration into $\S$, and the border where a local star terminates |
2298 into $\Z$. This coding is lossy, as it throws away the information about |
2473 into $0$. This coding is lossy, as it throws away the information about |
2299 characters, and also does not encode the ``boundary'' between two |
2474 characters, and also does not encode the ``boundary'' between two |
2300 sequence values. Moreover, with only the bitcode we cannot even tell |
2475 sequence values. Moreover, with only the bitcode we cannot even tell |
2301 whether the $\S$s and $\Z$s are for $\Left/\Right$ or $\Stars$. The |
2476 whether the $1$s and $0$s are for $\Left/\Right$ or $\Stars$. The |
2302 reason for choosing this compact way of storing information is that the |
2477 reason for choosing this compact way of storing information is that the |
2303 relatively small size of bits can be easily manipulated and ``moved |
2478 relatively small size of bits can be easily manipulated and ``moved |
2304 around'' in a regular expression. In order to recover values, we will |
2479 around'' in a regular expression. In order to recover values, we will |
2305 need the corresponding regular expression as an extra information. This |
2480 need the corresponding regular expression as an extra information. This |
2306 means the decoding function is defined as: |
2481 means the decoding function is defined as: |
2309 %\begin{definition}[Bitdecoding of Values]\mbox{} |
2484 %\begin{definition}[Bitdecoding of Values]\mbox{} |
2310 \begin{center} |
2485 \begin{center} |
2311 \begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}} |
2486 \begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}} |
2312 $\textit{decode}'\,bs\,(\ONE)$ & $\dn$ & $(\Empty, bs)$\\ |
2487 $\textit{decode}'\,bs\,(\ONE)$ & $\dn$ & $(\Empty, bs)$\\ |
2313 $\textit{decode}'\,bs\,(c)$ & $\dn$ & $(\Char\,c, bs)$\\ |
2488 $\textit{decode}'\,bs\,(c)$ & $\dn$ & $(\Char\,c, bs)$\\ |
2314 $\textit{decode}'\,(\Z\!::\!bs)\;(r_1 + r_2)$ & $\dn$ & |
2489 $\textit{decode}'\,(0\!::\!bs)\;(r_1 + r_2)$ & $\dn$ & |
2315 $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}\; |
2490 $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}\; |
2316 (\Left\,v, bs_1)$\\ |
2491 (\Left\,v, bs_1)$\\ |
2317 $\textit{decode}'\,(\S\!::\!bs)\;(r_1 + r_2)$ & $\dn$ & |
2492 $\textit{decode}'\,(1\!::\!bs)\;(r_1 + r_2)$ & $\dn$ & |
2318 $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_2\;\textit{in}\; |
2493 $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_2\;\textit{in}\; |
2319 (\Right\,v, bs_1)$\\ |
2494 (\Right\,v, bs_1)$\\ |
2320 $\textit{decode}'\,bs\;(r_1\cdot r_2)$ & $\dn$ & |
2495 $\textit{decode}'\,bs\;(r_1\cdot r_2)$ & $\dn$ & |
2321 $\textit{let}\,(v_1, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}$\\ |
2496 $\textit{let}\,(v_1, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}$\\ |
2322 & & $\textit{let}\,(v_2, bs_2) = \textit{decode}'\,bs_1\,r_2$\\ |
2497 & & $\textit{let}\,(v_2, bs_2) = \textit{decode}'\,bs_1\,r_2$\\ |
2323 & & \hspace{35mm}$\textit{in}\;(\Seq\,v_1\,v_2, bs_2)$\\ |
2498 & & \hspace{35mm}$\textit{in}\;(\Seq\,v_1\,v_2, bs_2)$\\ |
2324 $\textit{decode}'\,(\Z\!::\!bs)\,(r^*)$ & $\dn$ & $(\Stars\,[], bs)$\\ |
2499 $\textit{decode}'\,(0\!::\!bs)\,(r^*)$ & $\dn$ & $(\Stars\,[], bs)$\\ |
2325 $\textit{decode}'\,(\S\!::\!bs)\,(r^*)$ & $\dn$ & |
2500 $\textit{decode}'\,(1\!::\!bs)\,(r^*)$ & $\dn$ & |
2326 $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r\;\textit{in}$\\ |
2501 $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r\;\textit{in}$\\ |
2327 & & $\textit{let}\,(\Stars\,vs, bs_2) = \textit{decode}'\,bs_1\,r^*$\\ |
2502 & & $\textit{let}\,(\Stars\,vs, bs_2) = \textit{decode}'\,bs_1\,r^*$\\ |
2328 & & \hspace{35mm}$\textit{in}\;(\Stars\,v\!::\!vs, bs_2)$\bigskip\\ |
2503 & & \hspace{35mm}$\textit{in}\;(\Stars\,v\!::\!vs, bs_2)$\bigskip\\ |
2329 |
2504 |
2330 $\textit{decode}\,bs\,r$ & $\dn$ & |
2505 $\textit{decode}\,bs\,r$ & $\dn$ & |
2340 \emph{Annotated regular expressions} are defined by the following |
2515 \emph{Annotated regular expressions} are defined by the following |
2341 grammar:%\comment{ALTS should have an $as$ in the definitions, not just $a_1$ and $a_2$} |
2516 grammar:%\comment{ALTS should have an $as$ in the definitions, not just $a_1$ and $a_2$} |
2342 |
2517 |
2343 \begin{center} |
2518 \begin{center} |
2344 \begin{tabular}{lcl} |
2519 \begin{tabular}{lcl} |
2345 $\textit{a}$ & $::=$ & $\textit{ZERO}$\\ |
2520 $\textit{a}$ & $::=$ & $\ZERO$\\ |
2346 & $\mid$ & $\textit{ONE}\;\;bs$\\ |
2521 & $\mid$ & $_{bs}\ONE$\\ |
2347 & $\mid$ & $\textit{CHAR}\;\;bs\,c$\\ |
2522 & $\mid$ & $_{bs}{\bf c}$\\ |
2348 & $\mid$ & $\textit{ALTS}\;\;bs\,as$\\ |
2523 & $\mid$ & $_{bs}\oplus \textit{as}$\\ |
2349 & $\mid$ & $\textit{SEQ}\;\;bs\,a_1\,a_2$\\ |
2524 & $\mid$ & $_{bs}a_1\cdot a_2$\\ |
2350 & $\mid$ & $\textit{STAR}\;\;bs\,a$ |
2525 & $\mid$ & $_{bs}a^*$ |
2351 \end{tabular} |
2526 \end{tabular} |
2352 \end{center} |
2527 \end{center} |
2353 %(in \textit{ALTS}) |
2528 %(in \textit{ALTS}) |
2354 |
2529 |
2355 \noindent |
2530 \noindent |
2356 where $bs$ stands for bitcodes, $a$ for $\bold{a}$nnotated regular |
2531 where $\textit{bs}$ stands for bitcodes, $a$ for $\textit{annotated}$ regular |
2357 expressions and $as$ for a list of annotated regular expressions. |
2532 expressions and $\textit{as}$ for a list of annotated regular expressions. |
2358 The alternative constructor($\textit{ALTS}$) has been generalized to |
2533 The alternative constructor($\oplus$) has been generalized to |
2359 accept a list of annotated regular expressions rather than just 2. |
2534 accept a list of annotated regular expressions rather than just 2. |
2360 We will show that these bitcodes encode information about |
2535 We will show that these bitcodes encode information about |
2361 the (POSIX) value that should be generated by the Sulzmann and Lu |
2536 the (POSIX) value that should be generated by the Sulzmann and Lu |
2362 algorithm. |
2537 algorithm. |
2363 |
2538 |
2368 defined as follows: |
2543 defined as follows: |
2369 |
2544 |
2370 %\begin{definition} |
2545 %\begin{definition} |
2371 \begin{center} |
2546 \begin{center} |
2372 \begin{tabular}{lcl} |
2547 \begin{tabular}{lcl} |
2373 $(\ZERO)^\uparrow$ & $\dn$ & $\textit{ZERO}$\\ |
2548 $(\ZERO)^\uparrow$ & $\dn$ & $\ZERO$\\ |
2374 $(\ONE)^\uparrow$ & $\dn$ & $\textit{ONE}\,[]$\\ |
2549 $(\ONE)^\uparrow$ & $\dn$ & $_{[]}\ONE$\\ |
2375 $(c)^\uparrow$ & $\dn$ & $\textit{CHAR}\,[]\,c$\\ |
2550 $(c)^\uparrow$ & $\dn$ & $_{bs}\textit{\bf c}$\\ |
2376 $(r_1 + r_2)^\uparrow$ & $\dn$ & |
2551 $(r_1 + r_2)^\uparrow$ & $\dn$ & |
2377 $\textit{ALTS}\;[]\,List((\textit{fuse}\,[\Z]\,r_1^\uparrow),\, |
2552 $_{[]}\oplus\,[(\textit{fuse}\,[0]\,r_1^\uparrow),\, |
2378 (\textit{fuse}\,[\S]\,r_2^\uparrow))$\\ |
2553 (\textit{fuse}\,[1]\,r_2^\uparrow)]$\\ |
2379 $(r_1\cdot r_2)^\uparrow$ & $\dn$ & |
2554 $(r_1\cdot r_2)^\uparrow$ & $\dn$ & |
2380 $\textit{SEQ}\;[]\,r_1^\uparrow\,r_2^\uparrow$\\ |
2555 $_{[]}r_1^\uparrow \cdot r_2^\uparrow$\\ |
2381 $(r^*)^\uparrow$ & $\dn$ & |
2556 $(r^*)^\uparrow$ & $\dn$ & |
2382 $\textit{STAR}\;[]\,r^\uparrow$\\ |
2557 $_{[]}r^\uparrow$\\ |
2383 \end{tabular} |
2558 \end{tabular} |
2384 \end{center} |
2559 \end{center} |
2385 %\end{definition} |
2560 %\end{definition} |
2386 |
2561 |
2387 \noindent |
2562 \noindent |
2391 attach bits to the front of an annotated regular expression. Its |
2566 attach bits to the front of an annotated regular expression. Its |
2392 definition is as follows: |
2567 definition is as follows: |
2393 |
2568 |
2394 \begin{center} |
2569 \begin{center} |
2395 \begin{tabular}{lcl} |
2570 \begin{tabular}{lcl} |
2396 $\textit{fuse}\;bs\,(\textit{ZERO})$ & $\dn$ & $\textit{ZERO}$\\ |
2571 $\textit{fuse}\;bs\,(\ZERO)$ & $\dn$ & $\textit{ZERO}$\\ |
2397 $\textit{fuse}\;bs\,(\textit{ONE}\,bs')$ & $\dn$ & |
2572 $\textit{fuse}\;bs\,(\textit{ONE}\,bs')$ & $\dn$ & |
2398 $\textit{ONE}\,(bs\,@\,bs')$\\ |
2573 $\textit{ONE}\,(bs\,@\,bs')$\\ |
2399 $\textit{fuse}\;bs\,(\textit{CHAR}\,bs'\,c)$ & $\dn$ & |
2574 $\textit{fuse}\;bs\,(\textit{CHAR}\,bs'\,c)$ & $\dn$ & |
2400 $\textit{CHAR}\,(bs\,@\,bs')\,c$\\ |
2575 $\textit{CHAR}\,(bs\,@\,bs')\,c$\\ |
2401 $\textit{fuse}\;bs\,(\textit{ALTS}\,bs'\,as)$ & $\dn$ & |
2576 $\textit{fuse}\;bs\,(\textit{ALTS}\,bs'\,as)$ & $\dn$ & |