diff -r b6984212cf87 -r 69292c2b54d8 etnms/etnms.tex --- a/etnms/etnms.tex Sat Feb 08 22:24:21 2020 +0000 +++ b/etnms/etnms.tex Sun Feb 16 22:28:26 2020 +0000 @@ -473,7 +473,7 @@ $\textit{a}$ & $::=$ & $\ZERO$\\ & $\mid$ & $_{bs}\ONE$\\ & $\mid$ & $_{bs}{\bf c}$\\ - & $\mid$ & $_{bs}\oplus\,as$\\ + & $\mid$ & $_{bs}\sum\,as$\\ & $\mid$ & $_{bs}a_1\cdot a_2$\\ & $\mid$ & $_{bs}a^*$ \end{tabular} @@ -483,7 +483,7 @@ \noindent where $bs$ stands for bitcodes, $a$ for $\bold{a}$nnotated regular expressions and $as$ for a list of annotated regular expressions. -The alternative constructor($\oplus$) has been generalized to +The alternative constructor($\sum$) has been generalized to accept a list of annotated regular expressions rather than just 2. We will show that these bitcodes encode information about the (POSIX) value that should be generated by the Sulzmann and Lu @@ -502,7 +502,7 @@ $(\ONE)^\uparrow$ & $\dn$ & $_{[]}\ONE$\\ $(c)^\uparrow$ & $\dn$ & $_{[]}{\bf c}$\\ $(r_1 + r_2)^\uparrow$ & $\dn$ & - $_{[]}\oplus[\textit{fuse}\,[0]\,r_1^\uparrow,\, + $_{[]}\sum[\textit{fuse}\,[0]\,r_1^\uparrow,\, \textit{fuse}\,[1]\,r_2^\uparrow]$\\ $(r_1\cdot r_2)^\uparrow$ & $\dn$ & $_{[]}r_1^\uparrow \cdot r_2^\uparrow$\\ @@ -526,8 +526,8 @@ $_{bs @ bs'}\ONE$\\ $\textit{fuse}\;bs\;_{bs'}{\bf c}$ & $\dn$ & $_{bs@bs'}{\bf c}$\\ - $\textit{fuse}\;bs\,_{bs'}\oplus\textit{as}$ & $\dn$ & - $_{bs@bs'}\oplus\textit{as}$\\ + $\textit{fuse}\;bs\,_{bs'}\sum\textit{as}$ & $\dn$ & + $_{bs@bs'}\sum\textit{as}$\\ $\textit{fuse}\;bs\; _{bs'}a_1\cdot a_2$ & $\dn$ & $_{bs@bs'}a_1 \cdot a_2$\\ $\textit{fuse}\;bs\,_{bs'}a^*$ & $\dn$ & @@ -595,12 +595,12 @@ $(_{bs}{\bf d})\,\backslash c$ & $\dn$ & $\textit{if}\;c=d\; \;\textit{then}\; _{bs}\ONE\;\textit{else}\;\ZERO$\\ - $(_{bs}\oplus \;\textit{as})\,\backslash c$ & $\dn$ & - $_{bs}\oplus\;(\textit{as.map}(\backslash c))$\\ + $(_{bs}\sum \;\textit{as})\,\backslash c$ & $\dn$ & + $_{bs}\sum\;(\textit{as.map}(\backslash c))$\\ $(_{bs}\;a_1\cdot a_2)\,\backslash c$ & $\dn$ & $\textit{if}\;\textit{bnullable}\,a_1$\\ - & &$\textit{then}\;_{bs}\oplus\,[(_{[]}\,(a_1\,\backslash c)\cdot\,a_2),$\\ - & &$\phantom{\textit{then},\;_{bs}\oplus\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\ + & &$\textit{then}\;_{bs}\sum\,[(_{[]}\,(a_1\,\backslash c)\cdot\,a_2),$\\ + & &$\phantom{\textit{then},\;_{bs}\sum\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\ & &$\textit{else}\;_{bs}\,(a_1\,\backslash c)\cdot a_2$\\ $(_{bs}a^*)\,\backslash c$ & $\dn$ & $_{bs}(\textit{fuse}\, [0] \; r\,\backslash c)\cdot @@ -625,7 +625,7 @@ head of the regular expression $a_2 \backslash c$ by fusing to it. The bitsequence $\textit{bs}$, which was initially attached to the first element of the sequence $a_1 \cdot a_2$, has -now been elevated to the top-level of $\oplus$, as this information will be +now been elevated to the top-level of $\sum$, as this information will be needed whichever way the sequence is matched---no matter whether $c$ belongs to $a_1$ or $ a_2$. After building these derivatives and maintaining all the lexing information, we complete the lexing by collecting the @@ -637,10 +637,10 @@ \begin{center} \begin{tabular}{lcl} $\textit{bmkeps}\,(_{bs}\ONE)$ & $\dn$ & $bs$\\ - $\textit{bmkeps}\,(_{bs}\oplus a::\textit{as})$ & $\dn$ & + $\textit{bmkeps}\,(_{bs}\sum a::\textit{as})$ & $\dn$ & $\textit{if}\;\textit{bnullable}\,a$\\ & &$\textit{then}\;bs\,@\,\textit{bmkeps}\,a$\\ - & &$\textit{else}\;bs\,@\,\textit{bmkeps}\,(_{bs}\oplus \textit{as})$\\ + & &$\textit{else}\;bs\,@\,\textit{bmkeps}\,(_{bs}\sum \textit{as})$\\ $\textit{bmkeps}\,(_{bs} a_1 \cdot a_2)$ & $\dn$ & $bs \,@\,\textit{bmkeps}\,a_1\,@\, \textit{bmkeps}\,a_2$\\ $\textit{bmkeps}\,(_{bs}a^*)$ & $\dn$ & @@ -684,7 +684,7 @@ expressions and strings. One modification we introduced is to allow a list of annotated regular -expressions in the $\oplus$ constructor. This allows us to not just +expressions in the $\sum$ constructor. This allows us to not just delete unnecessary $\ZERO$s and $\ONE$s from regular expressions, but also unnecessary ``copies'' of regular expressions (very similar to simplifying $r + r$ to just $r$, but in a more general setting). Another @@ -707,10 +707,10 @@ &&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow \textit{fuse} \; bs \; a_1'$ \\ &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow _{bs}a_1' \cdot a_2'$ \\ - $\textit{simp} \; (_{bs}\oplus \textit{as})$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map simp as})) \; \textit{match} $ \\ + $\textit{simp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map simp as})) \; \textit{match} $ \\ &&$\quad\textit{case} \; [] \Rightarrow \ZERO$ \\ &&$\quad\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$ \\ - &&$\quad\textit{case} \; as' \Rightarrow _{bs}\oplus \textit{as'}$\\ + &&$\quad\textit{case} \; as' \Rightarrow _{bs}\sum \textit{as'}$\\ $\textit{simp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$ \end{tabular} @@ -722,16 +722,16 @@ sequence, it will try to simplify its children regular expressions recursively and then see if one of the children turn into $\ZERO$ or $\ONE$, which might trigger further simplification at the current level. -The most involved part is the $\oplus$ clause, where we use two +The most involved part is the $\sum$ clause, where we use two auxiliary functions $\textit{flatten}$ and $\textit{distinct}$ to open up nested alternatives and reduce as many duplicates as possible. Function $\textit{distinct}$ keeps the first occurring copy only and remove all later ones -when detected duplicates. Function $\textit{flatten}$ opens up nested $\oplus$s. +when detected duplicates. Function $\textit{flatten}$ opens up nested $\sum$s. Its recursive definition is given below: \begin{center} \begin{tabular}{@{}lcl@{}} - $\textit{flatten} \; (_{bs}\oplus \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \; + $\textit{flatten} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \; (\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flatten} \; as' $ \\ $\textit{flatten} \; \ZERO :: as'$ & $\dn$ & $ \textit{flatten} \; \textit{as'} $ \\ $\textit{flatten} \; a :: as'$ & $\dn$ & $a :: \textit{flatten} \; \textit{as'}$ \quad(otherwise) @@ -961,10 +961,10 @@ \begin{tabular}{@{}l@{\hspace{2mm}}c@{\hspace{2mm}}l@{}} $\textit{retrieve}\,(_{bs}\ONE)\,\Empty$ & $\dn$ & $bs$\\ $\textit{retrieve}\,(_{bs}{\bf c})\,(\Char\,d)$ & $\dn$ & $bs$\\ - $\textit{retrieve}\,(_{bs}\oplus a::as)\,(\Left\,v)$ & $\dn$ & + $\textit{retrieve}\,(_{bs}\sum a::as)\,(\Left\,v)$ & $\dn$ & $bs \,@\, \textit{retrieve}\,a\,v$\\ - $\textit{retrieve}\,(_{bs}\oplus a::as)\,(\Right\,v)$ & $\dn$ & - $\textit{bs} \,@\, \textit{retrieve}\,(_{[]}\oplus as)\,v$\\ + $\textit{retrieve}\,(_{bs}\sum a::as)\,(\Right\,v)$ & $\dn$ & + $\textit{bs} \,@\, \textit{retrieve}\,(_{[]}\sum as)\,v$\\ $\textit{retrieve}\,(_{bs}a_1\cdot a_2)\,(\Seq\,v_1\,v_2)$ & $\dn$ & $bs \,@\,\textit{retrieve}\,a_1\,v_1\,@\, \textit{retrieve}\,a_2\,v_2$\\ $\textit{retrieve}\,(_{bs}a^*)\,(\Stars\,[])$ & $\dn$ & @@ -1165,19 +1165,19 @@ \noindent -and the difinition of $\flatten$: +and the definition of $\flatten$: \begin{center} \begin{tabular}{c c c} $\flatten \; []$ & $\dn$ & $[]$\\ $\flatten \; \ZERO::rs$ & $\dn$ & $rs$\\ - $\flatten \;(_{\textit{bs}_1}\oplus \textit{rs}_1 ::rs)$ & $\dn$ & $(\map \, (\fuse \, \textit{bs}_1) \,\textit{rs}_1) ::: \flatten(rs)$\\ + $\flatten \;(_{\textit{bs}_1}\sum \textit{rs}_1 ::rs)$ & $\dn$ & $(\map \, (\fuse \, \textit{bs}_1) \,\textit{rs}_1) ::: \flatten(rs)$\\ $\flatten \; r :: rs$ & $\dn$ & $r::\flatten(rs)$ \end{tabular} \end{center} \noindent If we call $\simp$ on $\rup\backslash a$, just as $\backslash_{simp}$ -requires, then we would go throught the third clause of +requires, then we would go through the third clause of the sequence case:$\quad\textit{case} \; (\ONE, a_2') \Rightarrow \textit{fuse} \; bs \; a_2'$. The $\ZERO$ of $(_0\ONE + \ZERO)$ is thrown away by $\flatten$ and @@ -1190,7 +1190,11 @@ _0(_0 \ONE + _1(_1\ONE \cdot a^*)) $, and this simplifies to $_0(_0 \ONE + _{11} a^*)$ by the third clause of the alternative case: - $\quad\textit{case} \; as' \Rightarrow \textit{ALTS}\;bs\;as'$. + \begin{center} + $\quad\textit{case} \; as' \Rightarrow _{bs}\sum{as'}$. + \end{center} + + \noindent The outmost bit $_0$ stays with the outmost regular expression, rather than being fused to its child regular expressions, as what we will later see happens @@ -1240,8 +1244,8 @@ \begin{center} \begin{tabular}{lcl} $D =\{ (r_1 \cdot r_2,\; [c_1c_2]) \mid $ & $\simp(r_2) = r_2, \simp(r_1 \backslash c_1) = \ONE,$\\ - $r_1 \; not \; \nullable, c_2 \in L(r_2),$ & $\exists \textit{rs},\textit{bs}.\; r_2 \backslash c_2 = _{bs}{\oplus rs}$\\ -$\exists \textit{rs}_1. \; \simp(r_2 \backslash c_2) = _{bs}{\oplus \textit{rs}_1}$ & $and \;\simp(r_1 \backslash [c_1c_2]) = \ZERO\}$\\ + $r_1 \; not \; \nullable, c_2 \in L(r_2),$ & $\exists \textit{rs},\textit{bs}.\; r_2 \backslash c_2 = _{bs}{\sum rs}$\\ +$\exists \textit{rs}_1. \; \simp(r_2 \backslash c_2) = _{bs}{\sum \textit{rs}_1}$ & $and \;\simp(r_1 \backslash [c_1c_2]) = \ZERO\}$\\ \end{tabular} \end{center} We take a pair $(r, \;s)$ from the set $D$. @@ -1265,9 +1269,9 @@ \end{center} which is equal to \begin{center} -$\simp \left[ \fuse \; \textit{bs}_2 \; _{bs}{\oplus rs} \right]$\\ -$=\simp \left[ \; _{bs_2++bs}{\oplus rs} \right]$\\ -$= \; _{bs_2++bs}{\oplus \textit{rs}_1} $ +$\simp \left[ \fuse \; \textit{bs}_2 \; _{bs}{\sum rs} \right]$\\ +$=\simp \left[ \; _{bs_2++bs}{\sum rs} \right]$\\ +$= \; _{bs_2++bs}{\sum \textit{rs}_1} $ \end{center} \noindent by using the properties from the set $D$ again @@ -1284,34 +1288,34 @@ the following chain of equalities: \begin{center} $\simp(\rup \backslash s)= \simp \left[ ((r_1 \cdot r_2 )\backslash c_1) \backslash c_2 \right]= \simp\left[ \left(r_1 \backslash c_1\right) \cdot r_2 \big) \backslash c_2 \right]$\\ -$= \simp \left[ \oplus[\big( \left(r_1 \backslash c_1\right) \backslash c_2 \big) \cdot r_2 \; , \; \fuse \; (\bmkeps \;r_1\backslash c_1) \; r_2 \backslash c_2 ] \right]$, +$= \simp \left[ \sum[\big( \left(r_1 \backslash c_1\right) \backslash c_2 \big) \cdot r_2 \; , \; \fuse \; (\bmkeps \;r_1\backslash c_1) \; r_2 \backslash c_2 ] \right]$, \end{center} \noindent as before, we call the bitcode returned by $\bmkeps(r_1\backslash c_1)$ as $\textit{bs}_2$. Also, $\simp(r_2 \backslash c_2)$ is -$_{bs}\oplus \textit{rs}_1$, +$_{bs}\sum \textit{rs}_1$, and $( \left(r_1 \backslash c_1\right) \backslash c_2 \cdot r_2)$ simplifies to $\ZERO$, so the above term can be expanded as \begin{center} \begin{tabular}{l} -$\textit{distinct}(\flatten[\ZERO\;, \; _{\textit{bs}_2++\textit{bs}}\oplus \textit{rs}_1] ) \; \textit{match} $ \\ +$\textit{distinct}(\flatten[\ZERO\;, \; _{\textit{bs}_2++\textit{bs}}\sum \textit{rs}_1] ) \; \textit{match} $ \\ $\textit{case} \; [] \Rightarrow \ZERO$ \\ $\textit{case} \; a :: [] \Rightarrow \textit{\fuse \; \textit{bs} a}$ \\ - $\textit{case} \; as' \Rightarrow _{[]}\oplus as'$\\ + $\textit{case} \; as' \Rightarrow _{[]}\sum as'$\\ \end{tabular} \end{center} \noindent Applying the definition of $\flatten$, we get \begin{center} -$_{[]}\oplus (\textit{map} \; \fuse (\textit{bs}_2 ++ bs) rs_1)$ +$_{[]}\sum (\textit{map} \; \fuse (\textit{bs}_2 ++ bs) rs_1)$ \end{center} \noindent compared to the result \begin{center} -$ \; _{bs_2++bs}{\oplus \textit{rs}_1} $ +$ \; _{bs_2++bs}{\sum \textit{rs}_1} $ \end{center} \noindent Note how these two regular expressions only @@ -1325,7 +1329,7 @@ caused $\simp(\rup \backslash s)$ to generate the nested alternatives structure: \begin{center} -$ \oplus[\ZERO \;, \; _{bs}\oplus \textit{rs} ]$ +$ \sum[\ZERO \;, \; _{bs}\sum \textit{rs} ]$ \end{center} and this will always trigger the $\flatten$ to spill out the inner alternative's bitcode $\textit{bs}$, @@ -1340,9 +1344,9 @@ Like, for the above regular expression, we might just delete the outer layer of alternative \begin{center} -\st{$ {\oplus[\ZERO \;,}$} $_{bs}\oplus \textit{rs}$ \st{$]$} +\st{$ {\sum[\ZERO \;,}$} $_{bs}\sum \textit{rs}$ \st{$]$} \end{center} -and get $_{bs}\oplus \textit{rs}$ instead, without +and get $_{bs}\sum \textit{rs}$ instead, without fusing the bits $\textit{bs}$ inside to every element of $\textit{rs}$. This idea can be realized by making the following @@ -1352,13 +1356,13 @@ $\textit{simp}' \; (_{\textit{bs}}(a_1 \cdot a_2))$ & $\dn$ & $\textit{as} \; \simp \; \textit{was} \; \textit{before} $ \\ - $\textit{simp}' \; (_{bs}\oplus as)$ & $\dn$ & \st{$\textit{distinct}( \textit{flatten} ( \textit{map simp as})) \; \textit{match} $} \\ + $\textit{simp}' \; (_{bs}\sum as)$ & $\dn$ & \st{$\textit{distinct}( \textit{flatten} ( \textit{map simp as})) \; \textit{match} $} \\ &&\st{$\quad\textit{case} \; [] \Rightarrow \ZERO$} \\ &&\st{$\quad\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$} \\ &&\st{$\quad\textit{case} \; as' \Rightarrow \textit{ALTS}\;bs\;as'$}\\ &&$\textit{if}(\textit{hollowAlternatives}( \textit{map \; simp \; as}))$\\ &&$\textit{then} \; \fuse \; \textit{bs}\; \textit{extractAlt}(\textit{map} \; \simp \; \textit{as} )$\\ - &&$\textit{else} \; \simp(_{bs} \oplus \textit{as})$\\ + &&$\textit{else} \; \simp(_{bs} \sum \textit{as})$\\ $\textit{simp}' \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$ @@ -1369,23 +1373,23 @@ given the definition of $\textit{hollowAlternatives}$ and $\textit{extractAlt}$ : \begin{center} $\textit{hollowAlternatives}( \textit{rs}) \dn -\exists r = (_{\textit{bs}_1}\oplus \textit{rs}_1) \in \textit{rs}. \forall r' \in \textit{rs}, \; +\exists r = (_{\textit{bs}_1}\sum \textit{rs}_1) \in \textit{rs}. \forall r' \in \textit{rs}, \; \textit{either} \; r' = \ZERO \; \textit{or} \; r' = r $ $\textit{extractAlt}( \textit{rs}) \dn \textit{if}\big( -\exists r = (_{\textit{bs}_1}\oplus \textit{rs}_1) \in \textit{rs}. \forall r' \in \textit{rs}, \; +\exists r = (_{\textit{bs}_1}\sum \textit{rs}_1) \in \textit{rs}. \forall r' \in \textit{rs}, \; \textit{either} \; r' = \ZERO \; \textit{or} \; r' = r \big)\; \textit{then} \; \textit{return} \; r$ \end{center} \noindent Basically, $\textit{hollowAlternatives}$ captures the feature of a list of regular expression of the shape \begin{center} -$ \oplus[\ZERO \;, \; _{bs}\oplus \textit{rs} ]$ +$ \sum[\ZERO \;, \; _{bs}\sum \textit{rs} ]$ \end{center} and this means we can simply elevate the -inner regular expression $_{bs}\oplus \textit{rs}$ +inner regular expression $_{bs}\sum \textit{rs}$ to the outmost and throw away the useless $\ZERO$s and -the outer $\oplus$ wrapper. +the outer $\sum$ wrapper. Using this new definition of simp, under the example where $r$ is the regular expression $(a+b)(a+a*)$ and $s$ is the string $aa$ @@ -1460,9 +1464,9 @@ $\textit{sub}(r_1 + r_2) \dn \textit{sub}(r_1) \cup \textit{sub}(r_2) \cup \{r_1+r_2\}$\\ \end{center} $\textit{good}(r) \dn r \neq \ZERO \land \\ -\forall r' \in \textit{sub}(r), \textit{if} \; r' = _{bs_1}\oplus(rs_1), \; +\forall r' \in \textit{sub}(r), \textit{if} \; r' = _{bs_1}\sum(rs_1), \; \textit{then} \nexists r'' \in \textit{rs}_1 \; s.t.\; -r'' = _{bs_2}\oplus \textit{rs}_2 $ +r'' = _{bs_2}\sum \textit{rs}_2 $ The properties are mainly the ones below: \begin{itemize} @@ -1507,12 +1511,12 @@ $\textit{bs} @ \textit{bs}_1 @ \textit{bs}_2$\\ $\textit{if} \; r \gg \textit{bs}_1$ & & $\textit{then}$ & - $_{bs}{\oplus(r :: \textit{rs}})$ & + $_{bs}{\sum(r :: \textit{rs}})$ & $\gg$ & $\textit{bs} @ \textit{bs}_1 $\\ - $\textit{if} \; _{bs}(\oplus \textit{rs}) \gg \textit{bs} @ \textit{bs}_1$ & & $\textit{then}$ & - $_{bs}{\oplus(r :: \textit{rs}})$ & + $\textit{if} \; _{bs}(\sum \textit{rs}) \gg \textit{bs} @ \textit{bs}_1$ & & $\textit{then}$ & + $_{bs}{\sum(r :: \textit{rs}})$ & $\gg$ & $\textit{bs} @ \textit{bs}_1 $\\