etnms/etnms.tex
changeset 114 dd7f719c451d
parent 113 8dbc83ecea3b
child 115 5c8afe4a8090
equal deleted inserted replaced
113:8dbc83ecea3b 114:dd7f719c451d
     1 \documentclass[a4paper,UKenglish]{lipics}
     1 \documentclass[a4paper,UKenglish]{lipics}
     2 \usepackage{graphic}
     2 \usepackage{graphic}
     3 \usepackage{data}
     3 \usepackage{data}
     4 \usepackage{tikz-cd}
     4 \usepackage{tikz-cd}
       
     5 \usepackage{tikz}
       
     6 \usetikzlibrary{graphs}
       
     7 \usetikzlibrary{graphdrawing}
       
     8 \usegdlibrary{trees}
     5 %\usepackage{algorithm}
     9 %\usepackage{algorithm}
     6 \usepackage{amsmath}
    10 \usepackage{amsmath}
     7 \usepackage{xcolor}
    11 \usepackage{xcolor}
     8 \usepackage[noend]{algpseudocode}
    12 \usepackage[noend]{algpseudocode}
     9 \usepackage{enumitem}
    13 \usepackage{enumitem}
   346 make them compatible with the $\textit{inj}$-mechanism, we use
   350 make them compatible with the $\textit{inj}$-mechanism, we use
   347 \emph{bitcodes}. They were first introduced by Sulzmann and Lu.
   351 \emph{bitcodes}. They were first introduced by Sulzmann and Lu.
   348 Here bits and bitcodes (lists of bits) are defined as:
   352 Here bits and bitcodes (lists of bits) are defined as:
   349 
   353 
   350 \begin{center}
   354 \begin{center}
   351 		$b ::=   S \mid  Z \qquad
   355 		$b ::=   1 \mid  0 \qquad
   352 bs ::= [] \mid b:bs    
   356 bs ::= [] \mid b:bs    
   353 $
   357 $
   354 \end{center}
   358 \end{center}
   355 
   359 
   356 \noindent
   360 \noindent
   357 The $S$ and $Z$ are arbitrary names for the bits in order to avoid 
   361 The $1$ and $0$ are not in bold in order to avoid 
   358 confusion with the regular expressions $\ZERO$ and $\ONE$. Bitcodes (or
   362 confusion with the regular expressions $\ZERO$ and $\ONE$. Bitcodes (or
   359 bit-lists) can be used to encode values (or incomplete values) in a
   363 bit-lists) can be used to encode values (or incomplete values) in a
   360 compact form. This can be straightforwardly seen in the following
   364 compact form. This can be straightforwardly seen in the following
   361 coding function from values to bitcodes: 
   365 coding function from values to bitcodes: 
   362 
   366 
   363 \begin{center}
   367 \begin{center}
   364 \begin{tabular}{lcl}
   368 \begin{tabular}{lcl}
   365   $\textit{code}(\Empty)$ & $\dn$ & $[]$\\
   369   $\textit{code}(\Empty)$ & $\dn$ & $[]$\\
   366   $\textit{code}(\Char\,c)$ & $\dn$ & $[]$\\
   370   $\textit{code}(\Char\,c)$ & $\dn$ & $[]$\\
   367   $\textit{code}(\Left\,v)$ & $\dn$ & $\Z :: code(v)$\\
   371   $\textit{code}(\Left\,v)$ & $\dn$ & $0 :: code(v)$\\
   368   $\textit{code}(\Right\,v)$ & $\dn$ & $\S :: code(v)$\\
   372   $\textit{code}(\Right\,v)$ & $\dn$ & $1 :: code(v)$\\
   369   $\textit{code}(\Seq\,v_1\,v_2)$ & $\dn$ & $code(v_1) \,@\, code(v_2)$\\
   373   $\textit{code}(\Seq\,v_1\,v_2)$ & $\dn$ & $code(v_1) \,@\, code(v_2)$\\
   370   $\textit{code}(\Stars\,[])$ & $\dn$ & $[\Z]$\\
   374   $\textit{code}(\Stars\,[])$ & $\dn$ & $[0]$\\
   371   $\textit{code}(\Stars\,(v\!::\!vs))$ & $\dn$ & $\S :: code(v) \;@\;
   375   $\textit{code}(\Stars\,(v\!::\!vs))$ & $\dn$ & $1 :: code(v) \;@\;
   372                                                  code(\Stars\,vs)$
   376                                                  code(\Stars\,vs)$
   373 \end{tabular}    
   377 \end{tabular}    
   374 \end{center} 
   378 \end{center} 
   375 
   379 
   376 \noindent
   380 \noindent
   377 Here $\textit{code}$ encodes a value into a bitcodes by converting
   381 Here $\textit{code}$ encodes a value into a bitcodes by converting
   378 $\Left$ into $\Z$, $\Right$ into $\S$, the start point of a non-empty
   382 $\Left$ into $0$, $\Right$ into $1$, and marks the start of a non-empty
   379 star iteration into $\S$, and the border where a local star terminates
   383 star iteration by $1$. The border where a local star terminates
   380 into $\Z$. This coding is lossy, as it throws away the information about
   384 is marked by $0$. This coding is lossy, as it throws away the information about
   381 characters, and also does not encode the ``boundary'' between two
   385 characters, and also does not encode the ``boundary'' between two
   382 sequence values. Moreover, with only the bitcode we cannot even tell
   386 sequence values. Moreover, with only the bitcode we cannot even tell
   383 whether the $\S$s and $\Z$s are for $\Left/\Right$ or $\Stars$. The
   387 whether the $1$s and $0$s are for $\Left/\Right$ or $\Stars$. The
   384 reason for choosing this compact way of storing information is that the
   388 reason for choosing this compact way of storing information is that the
   385 relatively small size of bits can be easily manipulated and ``moved
   389 relatively small size of bits can be easily manipulated and ``moved
   386 around'' in a regular expression. In order to recover values, we will 
   390 around'' in a regular expression. In order to recover values, we will 
   387 need the corresponding regular expression as an extra information. This
   391 need the corresponding regular expression as an extra information. This
   388 means the decoding function is defined as:
   392 means the decoding function is defined as:
  1325 This idea can be realized by making the following
  1329 This idea can be realized by making the following
  1326 changes to the $\simp$-function:
  1330 changes to the $\simp$-function:
  1327 \begin{center}
  1331 \begin{center}
  1328   \begin{tabular}{@{}lcl@{}}
  1332   \begin{tabular}{@{}lcl@{}}
  1329    
  1333    
  1330   $\textit{simp} \; (_{\textit{bs}}(a_1 \cdot a_2))$ & $\dn$ & $ (\textit{simp} \; a_1, \textit{simp}  \; a_2) \; \textit{match} $ \\
  1334   $\textit{simp}' \; (_{\textit{bs}}(a_1 \cdot a_2))$ & $\dn$ & $\textit{as} \; \simp \; \textit{was} \; \textit{before} $ \\
  1331    &&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow  \ZERO$ \\
  1335 
  1332    &&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow  \ZERO$ \\
  1336   $\textit{simp}' \; (_{bs}\oplus as)$ & $\dn$ & \st{$\textit{distinct}( \textit{flatten} ( \textit{map simp as})) \; \textit{match} $} \\
  1333    &&$\quad\textit{case} \;  (\ONE, a_2') \Rightarrow  \textit{fuse} \; bs \;  a_2'$ \\
  1337   &&\st{$\quad\textit{case} \; [] \Rightarrow  \ZERO$} \\
  1334    &&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow  \textit{fuse} \; bs \;  a_1'$ \\
  1338    &&\st{$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$} \\
  1335    &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow  _{\textit{bs}}(a_1' \cdot a_2')$ \\
  1339    &&\st{$\quad\textit{case} \;  as' \Rightarrow  \textit{ALTS}\;bs\;as'$}\\ 
  1336 
  1340    &&$\textit{if}(\textit{hollowAlternatives}( \textit{map \; simp \; as}))$\\
  1337   $\textit{simp} \; (bs_\oplus as)$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map simp as})) \; \textit{match} $ \\
  1341    &&$\textit{then} \; \fuse  \; \textit{bs}\; \textit{extractAlt}(\textit{map} \; \simp \; \textit{as} )$\\
  1338   &&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
  1342    &&$\textit{else} \; \simp(_{bs} \oplus \textit{as})$\\
  1339    &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
  1343    
  1340    &&$\quad\textit{case} \;  as' \Rightarrow  \textit{ALTS}\;bs\;as'$\\ 
  1344 
  1341 
  1345    $\textit{simp}' \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$   
  1342    $\textit{simp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$   
       
  1343 \end{tabular}    
  1346 \end{tabular}    
  1344 \end{center}    
  1347 \end{center}    
  1345 
  1348 
  1346 \noindent
  1349 \noindent
  1347 
  1350 given the definition of $\textit{hollowAlternatives}$ and  $\textit{extractAlt}$ :
       
  1351 \begin{center}
       
  1352 $\textit{hollowAlternatives}( \textit{rs}) \dn 
       
  1353 \exists r = (_{\textit{bs}_1}\oplus \textit{rs}_1)  \in \textit{rs}.  \forall r' \in \textit{rs}, \;
       
  1354 \textit{either} \; r' = \ZERO \; \textit{or} \; r' = r $
       
  1355 $\textit{extractAlt}( \textit{rs}) \dn \textit{if}\big(
       
  1356 \exists r = (_{\textit{bs}_1}\oplus \textit{rs}_1)  \in \textit{rs}.  \forall r' \in \textit{rs}, \;
       
  1357 \textit{either} \; r' = \ZERO \; \textit{or} \; r' = r \big)\; \textit{then} \; \textit{return} \; r$
       
  1358 \end{center}
       
  1359 \noindent
       
  1360 Basically, $\textit{hollowAlternatives}$ captures the feature of
       
  1361 a list of regular expression of the shape 
       
  1362 \begin{center}
       
  1363 $  \oplus[\ZERO \;, \; _{bs}\oplus \textit{rs} ]$
       
  1364 \end{center}
       
  1365 and this means we can simply elevate the 
       
  1366 inner regular expression $_{bs}\oplus \textit{rs}$
       
  1367  to the outmost
       
  1368 and throw away the useless $\ZERO$s and
       
  1369 the outer $\oplus$ wrapper.
       
  1370 Using this new definition of simp, 
       
  1371 under the example where  $r$ is the regular expression
       
  1372 $(a+b)(a+a*)$ and $s$  is the string $aa$
       
  1373 the problem of $\rup\backslash_{simp} \, s \neq \simp(\rup\backslash s)$
       
  1374 is resolved.
       
  1375 
       
  1376 Unfortunately this causes new problems:
       
  1377 for the counterexample where 
       
  1378 \begin{center}
       
  1379 $r$ is the regular expression
       
  1380 $(ab+(a^*+aa))$ and $s$  is the string $aa$
       
  1381 \end{center}
       
  1382 $\rup\backslash_{simp} \, s \neq \simp(\rup\backslash s)$
       
  1383 happens again, whereas this does not happen for the old
       
  1384 version of $\simp$.
       
  1385 This dilemma causes this amendment not a successful 
       
  1386 attempt to make $\rup\backslash_{simp} \, s = \simp(\rup\backslash s)$
       
  1387 under every possible regular expression and string.
  1348 \subsection{Properties of the Function $\simp$}
  1388 \subsection{Properties of the Function $\simp$}
  1349 
  1389 
  1350 We have proved in Isabelle quite a few properties
  1390 We have proved in Isabelle quite a few properties
  1351 of the $\simp$-function, which helps the proof to go forward
  1391 of the $\simp$-function, which helps the proof to go forward
  1352 and we list them here to aid comprehension.
  1392 and we list them here to aid comprehension.
  1376 \item
  1416 \item
  1377 \begin{center}
  1417 \begin{center}
  1378 $\textit{if} r = \simp(r') \textit{then} \; \textit{good}(r) $
  1418 $\textit{if} r = \simp(r') \textit{then} \; \textit{good}(r) $
  1379 \end{center}
  1419 \end{center}
  1380 \end{itemize}
  1420 \end{itemize}
  1381 
  1421 \subsection{the Contains relation}
       
  1422 $\retrieve$ is a too strong relation in that
       
  1423 it only extracts one bitcode instead of a set of them.
       
  1424 Therefore we try to define another relation(predicate)
       
  1425 to capture the fact the regular expression has bits
       
  1426 being moved around but still has all the bits needed.
       
  1427 The contains symbol, written$\gg$, is  a relation that
       
  1428 takes two arguments in an infix form 
       
  1429 and returns a truth value. 
       
  1430 
       
  1431 
       
  1432 In other words, from the set of regular expression and 
       
  1433 bitcode pairs 
       
  1434 $\textit{RV} = \{(r, v) \mid r \text{r is a regular expression, v is a value}\}$,
       
  1435 those that satisfy the following requirements are in the set
       
  1436 $\textit{RV}_Contains$.
       
  1437 Unlike the $\retrieve$
       
  1438 function, which takes two arguments $r$ and $v$ and 
       
  1439 produces an only answer $\textit{bs}$, it takes only 
       
  1440 one argument $r$ and returns a set of bitcodes that 
       
  1441 can be generated by $r$.
       
  1442 \begin{center}
       
  1443 \begin{tabular}{llclll}
       
  1444 & & & $_{bs}\ONE$ & $\gg$ & $\textit{bs}$\\
       
  1445 & & & $_{bs}{\bf c}$ & $\gg$ & $\textit{bs}$\\
       
  1446 $\textit{if} \; r_1 \gg \textit{bs}_1$ & $r_2 \; \gg \textit{bs}_2$ 
       
  1447 & $\textit{then}$  &
       
  1448  $_{bs}{r_1 \cdot r_2}$ & 
       
  1449  $\gg$ & 
       
  1450  $\textit{bs} @ \textit{bs}_1 @ \textit{bs}_2$\\
       
  1451 
       
  1452  $\textit{if} \; r \gg \textit{bs}_1$ &  & $\textit{then}$  &
       
  1453  $_{bs}{\oplus(r :: \textit{rs}})$ & 
       
  1454  $\gg$ & 
       
  1455  $\textit{bs} @ \textit{bs}_1 $\\ 
       
  1456 
       
  1457  $\textit{if} \; _{bs}(\oplus \textit{rs}) \gg \textit{bs} @ \textit{bs}_1$ &  & $\textit{then}$  &
       
  1458  $_{bs}{\oplus(r :: \textit{rs}})$ & 
       
  1459  $\gg$ & 
       
  1460  $\textit{bs} @ \textit{bs}_1 $\\  
       
  1461 
       
  1462  $\textit{if} \; r \gg \textit{bs}_1\; \textit{and}$ &  $_{bs}r^* \gg \textit{bs} @ \textit{bs}_2$ & $\textit{then}$  &
       
  1463  $_{bs}r^* $ & 
       
  1464  $\gg$ & 
       
  1465  $\textit{bs} @ [0] @ \textit{bs}_1@ \textit{bs}_2 $\\
       
  1466  
       
  1467  & & & $_{bs}r^*$ & $\gg$ & $\textit{bs} @ [1]$\\
       
  1468 \end{tabular}
       
  1469 \end{center}
       
  1470 It is a predicate in the sense that given 
       
  1471 a regular expression and a bitcode, it 
       
  1472 returns true or false, depending on whether 
       
  1473 or not the regular expression can actually produce that
       
  1474 value. If the predicates returns a true, then 
       
  1475 we say that the regular expression $r$ contains
       
  1476 the bitcode $\textit{bs}$, written 
       
  1477 $r \gg \textit{bs}$.
       
  1478 The $\gg$ operator with the
       
  1479 regular expression $r$ may also be seen as a 
       
  1480 machine that does a derivative of regular expressions
       
  1481 on all strings simultaneously, taking 
       
  1482 the bits by going throught the regular expression tree
       
  1483  structure in a depth first manner, regardless of whether
       
  1484  the part being traversed is nullable or not.
       
  1485  It put all possible bits that can be produced on such a traversal
       
  1486  into a set.
       
  1487  For example, if we are given the regular expression 
       
  1488 $((a+b)(c+d))^*$, the tree structure may be written as
       
  1489 \begin{center}
       
  1490 \begin{tikzpicture}
       
  1491 \tikz[tree layout]\graph[nodes={draw, circle}] {
       
  1492 * -> 
       
  1493     {@-> {
       
  1494     {+1 ->
       
  1495         {a , b}
       
  1496         },
       
  1497     {+ ->
       
  1498         {c , d }
       
  1499         }
       
  1500         }
       
  1501     }
       
  1502 };
       
  1503 \end{tikzpicture}
       
  1504 \end{center}
       
  1505 \subsection{the $\textit{ders}_2$ Function}
       
  1506 If we want to prove the result 
       
  1507 \begin{center}
       
  1508 	$ \textit{blexer}\_{simp}(r, \; s) =  \textit{blexer}(r, \; s)$
       
  1509 \end{center}
       
  1510 inductively 
       
  1511 on the structure of the regular expression,
       
  1512 then we need to induct on the  case $r_1 \cdot r_2$,
       
  1513 it can be good if we could express $(r_1 \cdot r_2) \backslash s$
       
  1514 in terms of $r_1 \backslash s$ and $r_2 \backslash s$,
       
  1515 and this naturally induces the $ders2$ function,
       
  1516 which does a "convolution" on $r_1$ and $r_2$ using the string
       
  1517 $s$.
       
  1518 It is based on the observation that the derivative of $r_1 \cdot r_2$
       
  1519 with respect to a string $s$ can actually be written in an "explicit form"
       
  1520 composed of $r_1$'s derivative of $s$ and $r_2$'s derivative of $s$.
       
  1521 This can be illustrated in the following procedure execution 
       
  1522 \begin{center}
       
  1523 	$ (r_1 \cdot r_2) \backslash [c_1c_2] =  (\textit{if} \; \nullable(r_1)\;  \textit{then} \; ((r_1 \backslash c_1) \cdot r_2 + r_2 \backslash c_1) \; \textit{else} \; (r_1\backslash c_1) \cdot r_2) \backslash c_2$
       
  1524 \end{center}
       
  1525 which can also be written in a "convoluted sum"
       
  1526 format:
       
  1527 \begin{center}
       
  1528 	$ (r_1 \cdot r_2) \backslash [c_1c_2] =  \sum{r_1 \backslash s_i \cdot r_2 \backslash s_j}$
       
  1529 \end{center}
       
  1530 In a more serious manner, we should write
       
  1531 \begin{center}
       
  1532 	$ (r_1 \cdot r_2) \backslash [c_1c_2] =  \sum{r_1 \backslash s_i \cdot r_2 \backslash s_j}$
       
  1533 \end{center}
       
  1534 Note this differentiates from the previous form in the sense that
       
  1535 it calculates the results $r_1\backslash s_i , r_2 \backslash s_j$ first and then glue them together
       
  1536 through nested alternatives whereas the $r_1 \cdot r_2 \backslash s$ procedure,
       
  1537 used by algorithm $\lexer$, can only produce each component of the 
       
  1538 resulting alternatives regular expression altogether rather than 
       
  1539 generating each of the children nodes one by one
       
  1540 n a single recursive call that is only for generating that
       
  1541 very expression itself.
       
  1542 We have this 
       
  1543 \section{Conclusion}
       
  1544 Under the exhaustive tests we believe the main
       
  1545 result holds, yet a proof still seems elusive.
       
  1546 We have tried out different approaches, and 
       
  1547 found a lot of properties of the function $\simp$.
       
  1548 The counterexamples where $\rup\backslash_{simp} \, s \neq \simp(\rup\backslash s)$
       
  1549 are also valuable in the sense that 
       
  1550 we get to know better why they are not equal and what 
       
  1551 are the subtle differences between a 
       
  1552 nested simplified regular expression and a 
       
  1553 regular expression that is simplified at the final moment.
       
  1554 We are almost there, but a last step is needed to make the proof work.
       
  1555 Hopefully in the next few weeks we will be able to find one.
  1382 %CONSTRUCTION SITE HERE
  1556 %CONSTRUCTION SITE HERE
  1383 that is to say, despite the bits are being moved around on the regular expression
  1557 that is to say, despite the bits are being moved around on the regular expression
  1384 (difference in bits), the structure of the (unannotated)regular expression
  1558 (difference in bits), the structure of the (unannotated)regular expression
  1385 after one simplification is exactly the same after the 
  1559 after one simplification is exactly the same after the 
  1386 same sequence of derivative operations 
  1560 same sequence of derivative operations 
  1398 
  1572 
  1399 However, without erase the above equality does not hold:
  1573 However, without erase the above equality does not hold:
  1400 for the regular expression  
  1574 for the regular expression  
  1401 $(a+b)(a+a*)$,
  1575 $(a+b)(a+a*)$,
  1402 if we do derivative with respect to string $aa$,
  1576 if we do derivative with respect to string $aa$,
  1403 we get
  1577 we get 
  1404 
  1578 
  1405 \subsection{Another Proof Strategy}
  1579 \subsection{Another Proof Strategy}
  1406 sdddddr does not equal sdsdsdsr sometimes.\\
  1580 sdddddr does not equal sdsdsdsr sometimes.\\
  1407 For example,
  1581 For example,
  1408 
  1582 
  2264 %data, that a similar bound can be obtained for the derivatives in
  2438 %data, that a similar bound can be obtained for the derivatives in
  2265 %Sulzmann and Lu's algorithm. Let us give some details about this next.
  2439 %Sulzmann and Lu's algorithm. Let us give some details about this next.
  2266 
  2440 
  2267 
  2441 
  2268 \begin{center}
  2442 \begin{center}
  2269 		$b ::=   S \mid  Z \qquad
  2443 		$b ::=   0 \mid  1 \qquad
  2270 bs ::= [] \mid b:bs    
  2444 bs ::= [] \mid b:bs    
  2271 $
  2445 $
  2272 \end{center}
  2446 \end{center}
  2273 
  2447 
  2274 \noindent
  2448 \noindent
  2275 The $S$ and $Z$ are arbitrary names for the bits in order to avoid 
  2449 The $0$ and $1$ are arbitrary names for the bits  and not in bold
       
  2450 in order to avoid 
  2276 confusion with the regular expressions $\ZERO$ and $\ONE$. Bitcodes (or
  2451 confusion with the regular expressions $\ZERO$ and $\ONE$. Bitcodes (or
  2277 bit-lists) can be used to encode values (or incomplete values) in a
  2452 bit-lists) can be used to encode values (or incomplete values) in a
  2278 compact form. This can be straightforwardly seen in the following
  2453 compact form. This can be straightforwardly seen in the following
  2279 coding function from values to bitcodes: 
  2454 coding function from values to bitcodes: 
  2280 
  2455 
  2281 \begin{center}
  2456 \begin{center}
  2282 \begin{tabular}{lcl}
  2457 \begin{tabular}{lcl}
  2283   $\textit{code}(\Empty)$ & $\dn$ & $[]$\\
  2458   $\textit{code}(\Empty)$ & $\dn$ & $[]$\\
  2284   $\textit{code}(\Char\,c)$ & $\dn$ & $[]$\\
  2459   $\textit{code}(\Char\,c)$ & $\dn$ & $[]$\\
  2285   $\textit{code}(\Left\,v)$ & $\dn$ & $\Z :: code(v)$\\
  2460   $\textit{code}(\Left\,v)$ & $\dn$ & $0 :: code(v)$\\
  2286   $\textit{code}(\Right\,v)$ & $\dn$ & $\S :: code(v)$\\
  2461   $\textit{code}(\Right\,v)$ & $\dn$ & $1 :: code(v)$\\
  2287   $\textit{code}(\Seq\,v_1\,v_2)$ & $\dn$ & $code(v_1) \,@\, code(v_2)$\\
  2462   $\textit{code}(\Seq\,v_1\,v_2)$ & $\dn$ & $code(v_1) \,@\, code(v_2)$\\
  2288   $\textit{code}(\Stars\,[])$ & $\dn$ & $[\Z]$\\
  2463   $\textit{code}(\Stars\,[])$ & $\dn$ & $[0]$\\
  2289   $\textit{code}(\Stars\,(v\!::\!vs))$ & $\dn$ & $\S :: code(v) \;@\;
  2464   $\textit{code}(\Stars\,(v\!::\!vs))$ & $\dn$ & $1 :: code(v) \;@\;
  2290                                                  code(\Stars\,vs)$
  2465                                                  code(\Stars\,vs)$
  2291 \end{tabular}    
  2466 \end{tabular}    
  2292 \end{center} 
  2467 \end{center} 
  2293 
  2468 
  2294 \noindent
  2469 \noindent
  2295 Here $\textit{code}$ encodes a value into a bitcodes by converting
  2470 Here $\textit{code}$ encodes a value into a bitcodes by converting
  2296 $\Left$ into $\Z$, $\Right$ into $\S$, the start point of a non-empty
  2471 $\Left$ into $0$, $\Right$ into $1$, the start point of a non-empty
  2297 star iteration into $\S$, and the border where a local star terminates
  2472 star iteration into $\S$, and the border where a local star terminates
  2298 into $\Z$. This coding is lossy, as it throws away the information about
  2473 into $0$. This coding is lossy, as it throws away the information about
  2299 characters, and also does not encode the ``boundary'' between two
  2474 characters, and also does not encode the ``boundary'' between two
  2300 sequence values. Moreover, with only the bitcode we cannot even tell
  2475 sequence values. Moreover, with only the bitcode we cannot even tell
  2301 whether the $\S$s and $\Z$s are for $\Left/\Right$ or $\Stars$. The
  2476 whether the $1$s and $0$s are for $\Left/\Right$ or $\Stars$. The
  2302 reason for choosing this compact way of storing information is that the
  2477 reason for choosing this compact way of storing information is that the
  2303 relatively small size of bits can be easily manipulated and ``moved
  2478 relatively small size of bits can be easily manipulated and ``moved
  2304 around'' in a regular expression. In order to recover values, we will 
  2479 around'' in a regular expression. In order to recover values, we will 
  2305 need the corresponding regular expression as an extra information. This
  2480 need the corresponding regular expression as an extra information. This
  2306 means the decoding function is defined as:
  2481 means the decoding function is defined as:
  2309 %\begin{definition}[Bitdecoding of Values]\mbox{}
  2484 %\begin{definition}[Bitdecoding of Values]\mbox{}
  2310 \begin{center}
  2485 \begin{center}
  2311 \begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
  2486 \begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
  2312   $\textit{decode}'\,bs\,(\ONE)$ & $\dn$ & $(\Empty, bs)$\\
  2487   $\textit{decode}'\,bs\,(\ONE)$ & $\dn$ & $(\Empty, bs)$\\
  2313   $\textit{decode}'\,bs\,(c)$ & $\dn$ & $(\Char\,c, bs)$\\
  2488   $\textit{decode}'\,bs\,(c)$ & $\dn$ & $(\Char\,c, bs)$\\
  2314   $\textit{decode}'\,(\Z\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
  2489   $\textit{decode}'\,(0\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
  2315      $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}\;
  2490      $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}\;
  2316        (\Left\,v, bs_1)$\\
  2491        (\Left\,v, bs_1)$\\
  2317   $\textit{decode}'\,(\S\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
  2492   $\textit{decode}'\,(1\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
  2318      $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_2\;\textit{in}\;
  2493      $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_2\;\textit{in}\;
  2319        (\Right\,v, bs_1)$\\                           
  2494        (\Right\,v, bs_1)$\\                           
  2320   $\textit{decode}'\,bs\;(r_1\cdot r_2)$ & $\dn$ &
  2495   $\textit{decode}'\,bs\;(r_1\cdot r_2)$ & $\dn$ &
  2321         $\textit{let}\,(v_1, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}$\\
  2496         $\textit{let}\,(v_1, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}$\\
  2322   & &   $\textit{let}\,(v_2, bs_2) = \textit{decode}'\,bs_1\,r_2$\\
  2497   & &   $\textit{let}\,(v_2, bs_2) = \textit{decode}'\,bs_1\,r_2$\\
  2323   & &   \hspace{35mm}$\textit{in}\;(\Seq\,v_1\,v_2, bs_2)$\\
  2498   & &   \hspace{35mm}$\textit{in}\;(\Seq\,v_1\,v_2, bs_2)$\\
  2324   $\textit{decode}'\,(\Z\!::\!bs)\,(r^*)$ & $\dn$ & $(\Stars\,[], bs)$\\
  2499   $\textit{decode}'\,(0\!::\!bs)\,(r^*)$ & $\dn$ & $(\Stars\,[], bs)$\\
  2325   $\textit{decode}'\,(\S\!::\!bs)\,(r^*)$ & $\dn$ & 
  2500   $\textit{decode}'\,(1\!::\!bs)\,(r^*)$ & $\dn$ & 
  2326          $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r\;\textit{in}$\\
  2501          $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r\;\textit{in}$\\
  2327   & &   $\textit{let}\,(\Stars\,vs, bs_2) = \textit{decode}'\,bs_1\,r^*$\\
  2502   & &   $\textit{let}\,(\Stars\,vs, bs_2) = \textit{decode}'\,bs_1\,r^*$\\
  2328   & &   \hspace{35mm}$\textit{in}\;(\Stars\,v\!::\!vs, bs_2)$\bigskip\\
  2503   & &   \hspace{35mm}$\textit{in}\;(\Stars\,v\!::\!vs, bs_2)$\bigskip\\
  2329   
  2504   
  2330   $\textit{decode}\,bs\,r$ & $\dn$ &
  2505   $\textit{decode}\,bs\,r$ & $\dn$ &
  2340 \emph{Annotated regular expressions} are defined by the following
  2515 \emph{Annotated regular expressions} are defined by the following
  2341 grammar:%\comment{ALTS should have  an $as$ in  the definitions, not  just $a_1$ and $a_2$}
  2516 grammar:%\comment{ALTS should have  an $as$ in  the definitions, not  just $a_1$ and $a_2$}
  2342 
  2517 
  2343 \begin{center}
  2518 \begin{center}
  2344 \begin{tabular}{lcl}
  2519 \begin{tabular}{lcl}
  2345   $\textit{a}$ & $::=$  & $\textit{ZERO}$\\
  2520   $\textit{a}$ & $::=$  & $\ZERO$\\
  2346                   & $\mid$ & $\textit{ONE}\;\;bs$\\
  2521                   & $\mid$ & $_{bs}\ONE$\\
  2347                   & $\mid$ & $\textit{CHAR}\;\;bs\,c$\\
  2522                   & $\mid$ & $_{bs}{\bf c}$\\
  2348                   & $\mid$ & $\textit{ALTS}\;\;bs\,as$\\
  2523                   & $\mid$ & $_{bs}\oplus \textit{as}$\\
  2349                   & $\mid$ & $\textit{SEQ}\;\;bs\,a_1\,a_2$\\
  2524                   & $\mid$ & $_{bs}a_1\cdot a_2$\\
  2350                   & $\mid$ & $\textit{STAR}\;\;bs\,a$
  2525                   & $\mid$ & $_{bs}a^*$
  2351 \end{tabular}    
  2526 \end{tabular}    
  2352 \end{center}  
  2527 \end{center}  
  2353 %(in \textit{ALTS})
  2528 %(in \textit{ALTS})
  2354 
  2529 
  2355 \noindent
  2530 \noindent
  2356 where $bs$ stands for bitcodes, $a$  for $\bold{a}$nnotated regular
  2531 where $\textit{bs}$ stands for bitcodes, $a$  for $\textit{annotated}$ regular
  2357 expressions and $as$ for a list of annotated regular expressions.
  2532 expressions and $\textit{as}$ for a list of annotated regular expressions.
  2358 The alternative constructor($\textit{ALTS}$) has been generalized to 
  2533 The alternative constructor($\oplus$) has been generalized to 
  2359 accept a list of annotated regular expressions rather than just 2.
  2534 accept a list of annotated regular expressions rather than just 2.
  2360 We will show that these bitcodes encode information about
  2535 We will show that these bitcodes encode information about
  2361 the (POSIX) value that should be generated by the Sulzmann and Lu
  2536 the (POSIX) value that should be generated by the Sulzmann and Lu
  2362 algorithm.
  2537 algorithm.
  2363 
  2538 
  2368 defined as follows:
  2543 defined as follows:
  2369 
  2544 
  2370 %\begin{definition}
  2545 %\begin{definition}
  2371 \begin{center}
  2546 \begin{center}
  2372 \begin{tabular}{lcl}
  2547 \begin{tabular}{lcl}
  2373   $(\ZERO)^\uparrow$ & $\dn$ & $\textit{ZERO}$\\
  2548   $(\ZERO)^\uparrow$ & $\dn$ & $\ZERO$\\
  2374   $(\ONE)^\uparrow$ & $\dn$ & $\textit{ONE}\,[]$\\
  2549   $(\ONE)^\uparrow$ & $\dn$ & $_{[]}\ONE$\\
  2375   $(c)^\uparrow$ & $\dn$ & $\textit{CHAR}\,[]\,c$\\
  2550   $(c)^\uparrow$ & $\dn$ & $_{bs}\textit{\bf c}$\\
  2376   $(r_1 + r_2)^\uparrow$ & $\dn$ &
  2551   $(r_1 + r_2)^\uparrow$ & $\dn$ &
  2377   $\textit{ALTS}\;[]\,List((\textit{fuse}\,[\Z]\,r_1^\uparrow),\,
  2552   $_{[]}\oplus\,[(\textit{fuse}\,[0]\,r_1^\uparrow),\,
  2378   (\textit{fuse}\,[\S]\,r_2^\uparrow))$\\
  2553   (\textit{fuse}\,[1]\,r_2^\uparrow)]$\\
  2379   $(r_1\cdot r_2)^\uparrow$ & $\dn$ &
  2554   $(r_1\cdot r_2)^\uparrow$ & $\dn$ &
  2380          $\textit{SEQ}\;[]\,r_1^\uparrow\,r_2^\uparrow$\\
  2555          $_{[]}r_1^\uparrow \cdot r_2^\uparrow$\\
  2381   $(r^*)^\uparrow$ & $\dn$ &
  2556   $(r^*)^\uparrow$ & $\dn$ &
  2382          $\textit{STAR}\;[]\,r^\uparrow$\\
  2557          $_{[]}r^\uparrow$\\
  2383 \end{tabular}    
  2558 \end{tabular}    
  2384 \end{center}    
  2559 \end{center}    
  2385 %\end{definition}
  2560 %\end{definition}
  2386 
  2561 
  2387 \noindent
  2562 \noindent
  2391 attach bits to the front of an annotated regular expression. Its
  2566 attach bits to the front of an annotated regular expression. Its
  2392 definition is as follows:
  2567 definition is as follows:
  2393 
  2568 
  2394 \begin{center}
  2569 \begin{center}
  2395 \begin{tabular}{lcl}
  2570 \begin{tabular}{lcl}
  2396   $\textit{fuse}\;bs\,(\textit{ZERO})$ & $\dn$ & $\textit{ZERO}$\\
  2571   $\textit{fuse}\;bs\,(\ZERO)$ & $\dn$ & $\textit{ZERO}$\\
  2397   $\textit{fuse}\;bs\,(\textit{ONE}\,bs')$ & $\dn$ &
  2572   $\textit{fuse}\;bs\,(\textit{ONE}\,bs')$ & $\dn$ &
  2398      $\textit{ONE}\,(bs\,@\,bs')$\\
  2573      $\textit{ONE}\,(bs\,@\,bs')$\\
  2399   $\textit{fuse}\;bs\,(\textit{CHAR}\,bs'\,c)$ & $\dn$ &
  2574   $\textit{fuse}\;bs\,(\textit{CHAR}\,bs'\,c)$ & $\dn$ &
  2400      $\textit{CHAR}\,(bs\,@\,bs')\,c$\\
  2575      $\textit{CHAR}\,(bs\,@\,bs')\,c$\\
  2401   $\textit{fuse}\;bs\,(\textit{ALTS}\,bs'\,as)$ & $\dn$ &
  2576   $\textit{fuse}\;bs\,(\textit{ALTS}\,bs'\,as)$ & $\dn$ &