cst_tests: comparison ninems/ninems.tex

equal deleted inserted replaced

-:481c8000de6d
+:d9d61a648292
 $abc$ in one star iteration, using the longest alternative $abc$ in the
 sub-expression $((((a+b)+ab)+c)+abc)$ (we use $r$ to denote this sub-expression
 for conciseness).
 Before $\textit{inj}$ comes into play,
 our lexer first builds derivative using string $abc$ (we simplified some regular expressions like
-$0 \cdot b$ to $0$ for conciseness; we also omit parentheses if
+$\ZERO \cdot b$ to $\ZERO$ for conciseness; we also omit parentheses if
 they are clear from the context):
 %Similarly, we allow
 %$\textit{ALT}$ to take a list of regular expressions as an argument
 %instead of just 2 operands to reduce the nested depth of
 %$\textit{ALT}$
 \cite{Antimirov95}. He defined the \emph{partial derivatives} of regular
 expressions as follows:
 \begin{center}
 \begin{tabular}{lcl}
-$\textit{pder} \; c \; 0$ & $\dn$ & $\emptyset$\\
+$\textit{pder} \; c \; \ZERO$ & $\dn$ & $\emptyset$\\
-$\textit{pder} \; c \; 1$ & $\dn$ & $\emptyset$ \\
+$\textit{pder} \; c \; \ONE$ & $\dn$ & $\emptyset$ \\
-$\textit{pder} \; c \; d$ & $\dn$ & $\textit{if} \; c \,=\, d \; \{  1   \}  \; \textit{else} \; \emptyset$ \\
+$\textit{pder} \; c \; d$ & $\dn$ & $\textit{if} \; c \,=\, d \; \{  \ONE   \}  \; \textit{else} \; \emptyset$ \\
 $\textit{pder} \; c \; r_1+r_2$ & $\dn$ & $pder \; c \; r_1 \cup pder \; c \;  r_2$ \\
 $\textit{pder} \; c \; r_1 \cdot r_2$ & $\dn$ & $\textit{if} \; nullable \; r_1 $\\
 & & $\textit{then} \; \{  r \cdot r_2 \mid r \in pder \; c \; r_1   \}  \cup pder \; c \; r_2 \;$\\
 & & $\textit{else} \; \{  r \cdot r_2 \mid r \in pder \; c \; r_1   \} $ \\
 $\textit{pder} \; c \; r^*$ & $\dn$ & $ \{  r' \cdot r^* \mid r' \in pder \; c \; r   \}  $ \\
 \noindent
 After internalise we do successive derivative operations on the
 annotated regular expressions. This derivative operation is the same as
 what we previously have for the simple regular expressions, except that
 we beed to take special care of the bitcodes:\comment{You need to be consitent with  ALTS and ALT; ALT is just
-an abbreviation; derivations and so on are defined for ALTS}
+an abbreviation; derivations and so on are defined for ALTS}\comment{no this is not the case,
+ALT for 2 regexes, ALTS for a list}
 %\begin{definition}{bder}
 \begin{center}
 \begin{tabular}{@{}lcl@{}}
 $(\textit{ZERO})\,\backslash c$ & $\dn$ & $\textit{ZERO}$\\
 \begin{center}
 \begin{tabular}{@{}lcl@{}}
 $\textit{simp} \; (\textit{SEQ}\;bs\,a_1\,a_2)$ & $\dn$ & $ (\textit{simp} \; a_1, \textit{simp}  \; a_2) \; \textit{match} $ \\
-&&$\quad\textit{case} \; (0, \_) \Rightarrow  0$ \\
+&&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow  \ZERO$ \\
-&&$\quad\textit{case} \; (\_, 0) \Rightarrow  0$ \\
+&&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow  \ZERO$ \\
-&&$\quad\textit{case} \;  (1, a_2') \Rightarrow  \textit{fuse} \; bs \;  a_2'$ \\
+&&$\quad\textit{case} \;  (\ONE, a_2') \Rightarrow  \textit{fuse} \; bs \;  a_2'$ \\
-&&$\quad\textit{case} \; (a_1', 1) \Rightarrow  \textit{fuse} \; bs \;  a_1'$ \\
+&&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow  \textit{fuse} \; bs \;  a_1'$ \\
 &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow  \textit{SEQ} \; bs \; a_1' \;  a_2'$ \\
 $\textit{simp} \; (\textit{ALTS}\;bs\,as)$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map simp as})) \; \textit{match} $ \\
-&&$\quad\textit{case} \; [] \Rightarrow  0$ \\
+&&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
 &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
-&&$\quad\textit{case} \;  as' \Rightarrow  \textit{ALT}\;bs\;as'$\\
+&&$\quad\textit{case} \;  as' \Rightarrow  \textit{ALTS}\;bs\;as'$\\
 $\textit{simp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$
 \end{tabular}
 \end{center}
 $\ONE$, which might trigger further simplification at the current level.
 The most involved part is the $\textit{ALTS}$ clause, where we use two
 auxiliary functions flatten and distinct to open up nested
 $\textit{ALTS}$ and reduce as many duplicates as possible. Function
 distinct  keeps the first occurring copy only and remove all later ones
-when detected duplicates. Function flatten opens up nested \textit{ALT}.
+when detected duplicates. Function flatten opens up nested \textit{ALTS}.
 Its recursive definition is given below:
 \begin{center}
 \begin{tabular}{@{}lcl@{}}
-$\textit{flatten} \; (\textit{ALT}\;bs\,as) :: as'$ & $\dn$ & $(\textit{map} \;
+$\textit{flatten} \; (\textit{ALTS}\;bs\,as) :: as'$ & $\dn$ & $(\textit{map} \;
 (\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flatten} \; as' $ \\
 $\textit{flatten} \; \textit{ZERO} :: as'$ & $\dn$ & $ \textit{flatten} \;  as' $ \\
 $\textit{flatten} \; a :: as'$ & $\dn$ & $a :: \textit{flatten} \; as'$ \quad(otherwise)
 \end{tabular}
 \end{center}
 w.r.t.~string:\comment{simp in  the [] case?}
 \begin{center}
 \begin{tabular}{lcl}
 $r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
-$r \backslash [\,] $ & $\dn$ & $r$
+$r \backslash_{simp} [\,] $ & $\dn$ & $r$
 \end{tabular}
 \end{center}
 \noindent
 we obtain an optimised version of the algorithm:
 \noindent We would settle this correctness claim. It is relatively
 straightforward to establish that after one simplification step, the part of a
 nullable derivative that corresponds to a POSIX value remains intact and can
 still be collected, in other words, we can show that\comment{Double-check....I
-think this  is not the case}
+think this  is not the case}\comment{If i remember correctly, you have proved this lemma.
+I feel this is indeed not true because you might place arbitrary
-\begin{center}
+bits on the regex r, however if this is the case, did i remember wrongly that
-$\textit{bmkeps} \; r = \textit{bmkeps} \; \textit{simp} \; r\;( r\; \textit{nullable})$
+you proved something like simplification does not affect $\textit{bmkeps}$ results?
+Anyway, i have amended this a little bit so it does not allow arbitrary bits attached
+to a regex. Maybe it works now.}
+\begin{center}
+$\textit{bmkeps} \; a = \textit{bmkeps} \; \textit{bsimp} \; a\;( a\; \textit{bnullable} and \textit{decode}(r, \textit{bmkeps}(a)) is a \textit{POSIX} value)$
 \end{center}
 \noindent
 as this basically comes down to proving actions like removing the
 additional $r$ in $r+r$  does not delete important POSIX information in
 	$\textit{bmkeps} \; \textit{blexer}\_{simp}(s, \; r) = \textit{bmkeps} \; \textit{blexer} \; \textit{simp}(s, \; r)$
 \end{center}
 \noindent That is, if we do derivative on regular expression $r$ and then
 simplify it, and repeat this process until we exhaust the string, we get a
-regular expression $r''$ that provides the POSIX matching as the result $r'$ of
+regular expression $r''$($\textit{LHS}$)  that provides the POSIX matching
-the normal derivative algorithm which only does derivative operation repeatedly
+information, which is exactly the same as the result $r'$($\textit{RHS}$ of the
-and has no simplification at all.  This might seem at first glance very
+normal derivative algorithm that only does derivative repeatedly and has no
-unintuitive, as the $r'$ is exponentially larger than $r''$. But this can be
+simplification at all.  This might seem at first glance very unintuitive, as
-explained in the following way: we are pruning away the possible matches that
+the $r'$ is exponentially larger than $r''$, but can be explained in the
-are not POSIX. Since there are exponentially non-POSIX matchings and only 1
+following way: we are pruning away the possible matches that are not POSIX.
-POSIX matching, it is understandable that our $r''$ can be a lot smaller.  we
+Since there are exponentially non-POSIX matchings and only 1 POSIX matching, it
-can still provide the same POSIX value if there is one.  This is not as
+is understandable that our $r''$ can be a lot smaller.  we can still provide
-straightforward as the previous proposition, as the two regular expressions $r$
+the same POSIX value if there is one.  This is not as straightforward as the
-and $\textit{simp}\; r$ might become very different regular expressions after
+previous proposition, as the two regular expressions $r'$ and $r''$ might have
-repeated application of $\textit{simp}$ and derivative.  The crucial point is
+become very different regular expressions.  The crucial point is to find the
-to find the indispensable information of a regular expression and how it is
+$\textit{POSIX}$  information of a regular expression and how it is modified,
-kept intact during simplification so that it performs as good as a regular
+augmented and propagated
-expression that has not been simplified in the subsequent derivative
+during simplification in parallel with the regularr expression that
-operations.  To aid this, we use the helping function retrieve described by
+has not been simplified in the subsequent derivative operations.  To aid this,
-Sulzmann and Lu: \\definition of retrieve\\
+we use the helping function retrieve described by Sulzmann and Lu: \\definition
+of retrieve TODO\\
-This function assembled the bitcode that corresponds to a parse tree for
+This function assembles the bitcode that corresponds to a parse tree for how
-how the current derivative matches the suffix of the string(the
+the current derivative matches the suffix of the string(the characters that
-characters that have not yet appeared, but is stored in the value).
+have not yet appeared, but will appear as the successive derivatives go on,
-Sulzmann and Lu used this to connect the bitcoded algorithm to the older
+how do we get this "future" information? By the value $v$, which is
-algorithm by the following equation:
+computed by a pass of the algorithm that uses
+$inj$ as described in the previous section).  Sulzmann and Lu used this
+to connect the bitcoded algorithm to the older algorithm by the following
+equation:
 \begin{center} $inj \;a\; c \; v = \textit{decode} \; (\textit{retrieve}\;
-	 ((\textit{internalise}\; r)\backslash_{simp} c) v)$ \end{center} A
+	 ((\textit{internalise}\; r)\backslash_{simp} c) v)$
-	 little fact that needs to be stated to help comprehension:
+\end{center}
-	 \begin{center} $r^\uparrow = a$($a$ stands for $\textit{annotated}).$
+A little fact that needs to be stated to help comprehension:
-	 \end{center} Ausaf and Urban also used this fact to prove  the
+\begin{center}
-	 correctness of bitcoded algorithm without simplification.  Our purpose
+	$r^\uparrow = a$($a$ stands for $\textit{annotated}).$
-	 of using this, however, is try to establish \\ $ \textit{retrieve} \;
+\end{center}
-	 a \; v \;=\; \textit{retrieve}  \; \textit{simp}(a) \; v'.$\\ The idea
+Ausaf and Urban also used this fact to prove  the
-	 is that using $v'$, a simplified version of $v$ that possibly had gone
+correctness of bitcoded algorithm without simplification.  Our purpose
-	 through the same simplification step as $\textit{simp}(a)$ we are
+of using this, however, is to establish
-	 still  able to extract the bit-sequence that gives the same parsing
+\begin{center}
-	 information as the unsimplified one.  After establishing this, we
+$ \textit{retrieve} \;
-	 might be able to finally bridge the gap of proving\\
+a \; v \;=\; \textit{retrieve}  \; \textit{simp}(a) \; v'.$
-	 $\textit{retrieve} \; r   \backslash  s \; v = \;\textit{retrieve} \;
+\end{center}
-	 \textit{simp}(r)  \backslash  s \; v'$\\ and subsequently\\
+The idea
-	 $\textit{retrieve} \; r \backslash  s \; v\; = \; \textit{retrieve} \;
+is that using $v'$, a simplified version of $v$ that possibly had gone
-	 r  \backslash_{simp}   s \; v'$.\\ This proves that our simplified
+through the same simplification step as $\textit{simp}(a)$ we are
-	 version of regular expression still contains all the bitcodes needed.
+able to extract the bit-sequence that gives the same parsing
+information as the unsimplified one.  After establishing this, we
+might be able to finally bridge the gap of proving
+\begin{center}
+$\textit{retrieve} \; r   \backslash  s \; v = \;\textit{retrieve} \;
+\textit{simp}(r)  \backslash  s \; v'$
+\end{center}
+and subsequently
+\begin{center}
+$\textit{retrieve} \; r \backslash  s \; v\; = \; \textit{retrieve} \;
+r  \backslash_{simp}   s \; v'$.
+\end{center}
+This proves that our simplified
+version of regular expression still contains all the bitcodes needed.
 The second task is to speed up the more aggressive simplification.
 Currently it is slower than a naive simplification(the naive version as
 implemented in ADU of course can explode in some cases). So it needs to

changeset 80	d9d61a648292
parent 79	481c8000de6d
child 81	a0df84875788