diff -r 481c8000de6d -r d9d61a648292 ninems/ninems.tex --- a/ninems/ninems.tex Mon Jul 22 23:01:00 2019 +0100 +++ b/ninems/ninems.tex Tue Jul 23 09:26:22 2019 +0100 @@ -586,7 +586,7 @@ for conciseness). Before $\textit{inj}$ comes into play, our lexer first builds derivative using string $abc$ (we simplified some regular expressions like -$0 \cdot b$ to $0$ for conciseness; we also omit parentheses if +$\ZERO \cdot b$ to $\ZERO$ for conciseness; we also omit parentheses if they are clear from the context): %Similarly, we allow %$\textit{ALT}$ to take a list of regular expressions as an argument @@ -746,9 +746,9 @@ \begin{center} \begin{tabular}{lcl} - $\textit{pder} \; c \; 0$ & $\dn$ & $\emptyset$\\ - $\textit{pder} \; c \; 1$ & $\dn$ & $\emptyset$ \\ - $\textit{pder} \; c \; d$ & $\dn$ & $\textit{if} \; c \,=\, d \; \{ 1 \} \; \textit{else} \; \emptyset$ \\ + $\textit{pder} \; c \; \ZERO$ & $\dn$ & $\emptyset$\\ + $\textit{pder} \; c \; \ONE$ & $\dn$ & $\emptyset$ \\ + $\textit{pder} \; c \; d$ & $\dn$ & $\textit{if} \; c \,=\, d \; \{ \ONE \} \; \textit{else} \; \emptyset$ \\ $\textit{pder} \; c \; r_1+r_2$ & $\dn$ & $pder \; c \; r_1 \cup pder \; c \; r_2$ \\ $\textit{pder} \; c \; r_1 \cdot r_2$ & $\dn$ & $\textit{if} \; nullable \; r_1 $\\ & & $\textit{then} \; \{ r \cdot r_2 \mid r \in pder \; c \; r_1 \} \cup pder \; c \; r_2 \;$\\ @@ -929,7 +929,8 @@ annotated regular expressions. This derivative operation is the same as what we previously have for the simple regular expressions, except that we beed to take special care of the bitcodes:\comment{You need to be consitent with ALTS and ALT; ALT is just -an abbreviation; derivations and so on are defined for ALTS} +an abbreviation; derivations and so on are defined for ALTS}\comment{no this is not the case, +ALT for 2 regexes, ALTS for a list} %\begin{definition}{bder} \begin{center} @@ -1039,16 +1040,16 @@ \begin{tabular}{@{}lcl@{}} $\textit{simp} \; (\textit{SEQ}\;bs\,a_1\,a_2)$ & $\dn$ & $ (\textit{simp} \; a_1, \textit{simp} \; a_2) \; \textit{match} $ \\ - &&$\quad\textit{case} \; (0, \_) \Rightarrow 0$ \\ - &&$\quad\textit{case} \; (\_, 0) \Rightarrow 0$ \\ - &&$\quad\textit{case} \; (1, a_2') \Rightarrow \textit{fuse} \; bs \; a_2'$ \\ - &&$\quad\textit{case} \; (a_1', 1) \Rightarrow \textit{fuse} \; bs \; a_1'$ \\ + &&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow \ZERO$ \\ + &&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow \ZERO$ \\ + &&$\quad\textit{case} \; (\ONE, a_2') \Rightarrow \textit{fuse} \; bs \; a_2'$ \\ + &&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow \textit{fuse} \; bs \; a_1'$ \\ &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow \textit{SEQ} \; bs \; a_1' \; a_2'$ \\ $\textit{simp} \; (\textit{ALTS}\;bs\,as)$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map simp as})) \; \textit{match} $ \\ - &&$\quad\textit{case} \; [] \Rightarrow 0$ \\ + &&$\quad\textit{case} \; [] \Rightarrow \ZERO$ \\ &&$\quad\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$ \\ - &&$\quad\textit{case} \; as' \Rightarrow \textit{ALT}\;bs\;as'$\\ + &&$\quad\textit{case} \; as' \Rightarrow \textit{ALTS}\;bs\;as'$\\ $\textit{simp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$ \end{tabular} @@ -1064,12 +1065,12 @@ auxiliary functions flatten and distinct to open up nested $\textit{ALTS}$ and reduce as many duplicates as possible. Function distinct keeps the first occurring copy only and remove all later ones -when detected duplicates. Function flatten opens up nested \textit{ALT}. +when detected duplicates. Function flatten opens up nested \textit{ALTS}. Its recursive definition is given below: \begin{center} \begin{tabular}{@{}lcl@{}} - $\textit{flatten} \; (\textit{ALT}\;bs\,as) :: as'$ & $\dn$ & $(\textit{map} \; + $\textit{flatten} \; (\textit{ALTS}\;bs\,as) :: as'$ & $\dn$ & $(\textit{map} \; (\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flatten} \; as' $ \\ $\textit{flatten} \; \textit{ZERO} :: as'$ & $\dn$ & $ \textit{flatten} \; as' $ \\ $\textit{flatten} \; a :: as'$ & $\dn$ & $a :: \textit{flatten} \; as'$ \quad(otherwise) @@ -1090,7 +1091,7 @@ \begin{center} \begin{tabular}{lcl} $r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\ -$r \backslash [\,] $ & $\dn$ & $r$ +$r \backslash_{simp} [\,] $ & $\dn$ & $r$ \end{tabular} \end{center} @@ -1142,10 +1143,15 @@ straightforward to establish that after one simplification step, the part of a nullable derivative that corresponds to a POSIX value remains intact and can still be collected, in other words, we can show that\comment{Double-check....I -think this is not the case} +think this is not the case}\comment{If i remember correctly, you have proved this lemma. +I feel this is indeed not true because you might place arbitrary +bits on the regex r, however if this is the case, did i remember wrongly that +you proved something like simplification does not affect $\textit{bmkeps}$ results? +Anyway, i have amended this a little bit so it does not allow arbitrary bits attached +to a regex. Maybe it works now.} \begin{center} -$\textit{bmkeps} \; r = \textit{bmkeps} \; \textit{simp} \; r\;( r\; \textit{nullable})$ +$\textit{bmkeps} \; a = \textit{bmkeps} \; \textit{bsimp} \; a\;( a\; \textit{bnullable} and \textit{decode}(r, \textit{bmkeps}(a)) is a \textit{POSIX} value)$ \end{center} \noindent @@ -1159,47 +1165,63 @@ \noindent That is, if we do derivative on regular expression $r$ and then simplify it, and repeat this process until we exhaust the string, we get a -regular expression $r''$ that provides the POSIX matching as the result $r'$ of -the normal derivative algorithm which only does derivative operation repeatedly -and has no simplification at all. This might seem at first glance very -unintuitive, as the $r'$ is exponentially larger than $r''$. But this can be -explained in the following way: we are pruning away the possible matches that -are not POSIX. Since there are exponentially non-POSIX matchings and only 1 -POSIX matching, it is understandable that our $r''$ can be a lot smaller. we -can still provide the same POSIX value if there is one. This is not as -straightforward as the previous proposition, as the two regular expressions $r$ -and $\textit{simp}\; r$ might become very different regular expressions after -repeated application of $\textit{simp}$ and derivative. The crucial point is -to find the indispensable information of a regular expression and how it is -kept intact during simplification so that it performs as good as a regular -expression that has not been simplified in the subsequent derivative -operations. To aid this, we use the helping function retrieve described by -Sulzmann and Lu: \\definition of retrieve\\ - -This function assembled the bitcode that corresponds to a parse tree for -how the current derivative matches the suffix of the string(the -characters that have not yet appeared, but is stored in the value). -Sulzmann and Lu used this to connect the bitcoded algorithm to the older -algorithm by the following equation: +regular expression $r''$($\textit{LHS}$) that provides the POSIX matching +information, which is exactly the same as the result $r'$($\textit{RHS}$ of the +normal derivative algorithm that only does derivative repeatedly and has no +simplification at all. This might seem at first glance very unintuitive, as +the $r'$ is exponentially larger than $r''$, but can be explained in the +following way: we are pruning away the possible matches that are not POSIX. +Since there are exponentially non-POSIX matchings and only 1 POSIX matching, it +is understandable that our $r''$ can be a lot smaller. we can still provide +the same POSIX value if there is one. This is not as straightforward as the +previous proposition, as the two regular expressions $r'$ and $r''$ might have +become very different regular expressions. The crucial point is to find the +$\textit{POSIX}$ information of a regular expression and how it is modified, +augmented and propagated +during simplification in parallel with the regularr expression that +has not been simplified in the subsequent derivative operations. To aid this, +we use the helping function retrieve described by Sulzmann and Lu: \\definition +of retrieve TODO\\ +This function assembles the bitcode that corresponds to a parse tree for how +the current derivative matches the suffix of the string(the characters that +have not yet appeared, but will appear as the successive derivatives go on, +how do we get this "future" information? By the value $v$, which is +computed by a pass of the algorithm that uses +$inj$ as described in the previous section). Sulzmann and Lu used this +to connect the bitcoded algorithm to the older algorithm by the following +equation: \begin{center} $inj \;a\; c \; v = \textit{decode} \; (\textit{retrieve}\; - ((\textit{internalise}\; r)\backslash_{simp} c) v)$ \end{center} A - little fact that needs to be stated to help comprehension: - \begin{center} $r^\uparrow = a$($a$ stands for $\textit{annotated}).$ - \end{center} Ausaf and Urban also used this fact to prove the - correctness of bitcoded algorithm without simplification. Our purpose - of using this, however, is try to establish \\ $ \textit{retrieve} \; - a \; v \;=\; \textit{retrieve} \; \textit{simp}(a) \; v'.$\\ The idea - is that using $v'$, a simplified version of $v$ that possibly had gone - through the same simplification step as $\textit{simp}(a)$ we are - still able to extract the bit-sequence that gives the same parsing - information as the unsimplified one. After establishing this, we - might be able to finally bridge the gap of proving\\ - $\textit{retrieve} \; r \backslash s \; v = \;\textit{retrieve} \; - \textit{simp}(r) \backslash s \; v'$\\ and subsequently\\ - $\textit{retrieve} \; r \backslash s \; v\; = \; \textit{retrieve} \; - r \backslash_{simp} s \; v'$.\\ This proves that our simplified - version of regular expression still contains all the bitcodes needed. + ((\textit{internalise}\; r)\backslash_{simp} c) v)$ + \end{center} + A little fact that needs to be stated to help comprehension: +\begin{center} + $r^\uparrow = a$($a$ stands for $\textit{annotated}).$ +\end{center} +Ausaf and Urban also used this fact to prove the +correctness of bitcoded algorithm without simplification. Our purpose +of using this, however, is to establish +\begin{center} +$ \textit{retrieve} \; +a \; v \;=\; \textit{retrieve} \; \textit{simp}(a) \; v'.$ +\end{center} +The idea +is that using $v'$, a simplified version of $v$ that possibly had gone +through the same simplification step as $\textit{simp}(a)$ we are +able to extract the bit-sequence that gives the same parsing +information as the unsimplified one. After establishing this, we +might be able to finally bridge the gap of proving +\begin{center} +$\textit{retrieve} \; r \backslash s \; v = \;\textit{retrieve} \; +\textit{simp}(r) \backslash s \; v'$ +\end{center} +and subsequently +\begin{center} +$\textit{retrieve} \; r \backslash s \; v\; = \; \textit{retrieve} \; +r \backslash_{simp} s \; v'$. +\end{center} +This proves that our simplified +version of regular expression still contains all the bitcodes needed. The second task is to speed up the more aggressive simplification.