cst_tests: comparison ninems/ninems.tex

equal deleted inserted replaced

-:25bbbb8b0e90
+:3ec403f650a8
 \end{tabular}
 \end{center}
 %\end{definition}
-Sulzmann and Lu's integrated the bitcodes into annotated regular expressions by attaching them to the head of every substructure of a regular expression\emph{annotated regular expressions}~\cite{Sulzmann2014}. They are
+Sulzmann and Lu's integrated the bitcodes into annotated regular expressions by attaching them to the head of every substructure of a regular expression\cite{Sulzmann2014}. They are
 defined by the following grammar:
 \begin{center}
 \begin{tabular}{lcl}
 $\textit{a}$ & $::=$  & $\textit{ZERO}$\\
 & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
 & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
 & & $\;\;\textit{else}\;\textit{None}$
 \end{tabular}
 \end{center}
+Here $(r^\uparrow)\backslash s$ is similar to what we have previously defined for
+$r\backslash s$.
 The main point of the bitsequences and annotated regular expressions
 is that we can apply rather aggressive (in terms of size)
 simplification rules in order to keep derivatives small.
 recursively and then see if one of the children turn into 0 or 1, which might trigger further simplification
 at the current level. The most involved part is the ALTS clause, where we use two auxiliary functions
 flatten and distinct to open up nested ALT and reduce as many duplicates as possible.
 Function distinct  keeps the first occurring copy only and remove all later ones when detected duplicates.
 Function flatten opens up nested ALT. Its recursive definition is given below:
-\\
+\begin{center}
-\[flatten ALT bs rs :: rss = (map fuse( bs,_)  rs )@ flatten rss
+\begin{tabular}{@{}lcl@{}}
-flatten ZERO :: rss = flatten rss
+$\textit{flatten} \; (\textit{ALT}\;bs\,as) :: as'$ & $\dn$ & $(\textit{ map fuse}( \textit{bs, \_} )  \textit{ as}) \; +\!+ \; \textit{flatten} \; as' $ \\
-flatten r::rss = r @ flatten rss if r is of any other shape
+$\textit{flatten} \; \textit{ZERO} :: as'$ & $\dn$ & $ \textit{flatten} \;  as' $ \\
-\]
+$\textit{flatten} \; a :: as'$ & $\dn$ & $a :: \textit{flatten} \; as' $
+\end{tabular}
+\end{center}
 Here flatten behaves like the traditional functional programming flatten function,
 what it does is basically removing parentheses like changing $a+(b+c)$ into $a+b+c$.
-If we apply simplification after each derivative step, we get an optimized version of the algorithm:
+Suppose we apply simplification after each derivative step,
-\\TODO definition of $blexer_simp$
+and view these two operations as an atomic one: $a \backslash_{simp} c \dn \textit{simp}(a \backslash c)$.
+Then we can use the previous natural extension from derivative w.r.t   character to derivative w.r.t string:
+\begin{center}
+\begin{tabular}{lcl}
+$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp} c) \backslash_{simp} s$ \\
+$r \backslash [\,] $ & $\dn$ & $r$
+\end{tabular}
+\end{center}
+we get an optimized version of the algorithm:
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{blexer\_simp}\;r\,s$ & $\dn$ &
+$\textit{let}\;a = (r^\uparrow)\backslash_{simp} s\;\textit{in}$\\
+& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
+& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
+& & $\;\;\textit{else}\;\textit{None}$
+\end{tabular}
+\end{center}
 This algorithm effectively keeps the regular expression size small, for example,
 with this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6 and stay constant, however long the input string is.
 The crucial point is to find the "gene" of a regular expression and how it is kept intact during simplification.
 To aid this, we are utilizing the helping function retrieve described by Sulzmann and Lu:
 \\definition of retrieve\\
 This function assembled the bitcode that corresponds to a parse tree for how the current derivative mathces the suffix of the string(the characters that have not yet appeared, but is stored in the value).
 Sulzmann and Lu used this to connect the bit-coded algorithm to the older algorithm by the following equation:\\
-inj r c v = decode retrieve (bder c (internalise r)) v
+$inj \;a\; c \; v = \textit{decode} \; (\textit{retrieve}\; ((\textit{internalise}\; r)\backslash_{simp} c) v)$\\
+A little fact that needs to be stated to help comprehension:\\
+$r^\uparrow = a$($a$ stands for $annotated$).\\
 Fahad and Christian also used this fact to prove  the correctness of bit-coded algorithm without simplificaiton.
 Our purpose of using this, however, is try to establish \\
-$ retrieve \; r \; v \;=\; retrieve  \; simp(r) \; v'.$\\
+$ \textit{retrieve} \; a \; v \;=\; \textit{retrieve}  \; \textit{simp}(a) \; v'.$\\
-Although the r and v are different now,  we can still extract the bitsequence that gies the same parsing information.
+The idea is that using $v'$,
-After establishing this, we might be able to finally bridge the gap of proving
+a simplified version of $v$ that possibly had gone through the same simplification step as $\textit{simp}(a)$ we are still  able to extract the bitsequence that gievs the same parsing information as the unsimplified one.
-$retrieve \; r  \; \backslash  \;s \; v = \;retrieve \; simp(r) \; \backslash \; s \; v'$
+After establishing this, we might be able to finally bridge the gap of proving\\
-and subsequently
+$\textit{retrieve} \; r   \backslash  s \; v = \;\textit{retrieve} \; \textit{simp}(r)  \backslash  s \; v'$\\
-$retrieve \; r \; \backslash \; s \; v\; = \; retrieve \; r \; \backslash{with simplification}  \; s \; v'$.
+and subsequently\\
+$\textit{retrieve} \; r \backslash  s \; v\; = \; \textit{retrieve} \; r  \backslash_{simp}   s \; v'$.\\
 This proves that our simplified version of regular expression still contains all the bitcodes neeeded.
 The second task is to speed up the more aggressive simplification. Currently it is slower than a naive simplifiction(the naive version as implemented in ADU of course can explode in some cases).
 So it needs to be explored how to make it faster. Our possibility would be to explore again the connection to DFAs. This is very much work in progress.

changeset 53	3ec403f650a8
parent 52	25bbbb8b0e90
child 54	4bfd28722884