cst_tests: comparison ninems/ninems.tex

equal deleted inserted replaced

-:cab5eab1f6f1
+:0573615e41a3
 that the hole will be on $r_a$. So we recursively call $\inj\,
 r_a\,c\,v_a$ to fill that hole in $v_a$. After injection, the value
 $v_i$ for $r_i = r_a \cdot r_b$ should be $\Seq\,(\inj\,r_a\,c\,v_a)\,v_b$.
 Other clauses can be understood in a similar way.
-The following example gives a \comment{Other word: insight?}taste of $\textit{inj}$'s effect
+%\comment{Other word: insight?}
+The following example gives an insight of $\textit{inj}$'s effect
 and how Sulzmann and Lu's algorithm works as a whole.
 Suppose we have a
 regular expression $((((a+b)+ab)+c)+abc)^*$, and want to match it against
 the string $abc$ (when $abc$ is written as a regular expression, the most
 standard way of expressing it should be $a \cdot (b \cdot c)$. We omit
 store the information for parse trees inside the regular expression,
 update it when we do derivative on them, and collect the information
 when finished with derivatives and call $mkeps$ for deciding which
 branch is POSIX, we can generate the parse tree in one pass, instead of
 doing the rest $n$ injections. This leads to Sulzmann and Lu's novel
-idea of using bit-codes in derivatives.
+idea of using bitcodes in derivatives.
 In the next section, we shall focus on the bit-coded algorithm and the
 process of simplification of regular expressions. This is needed in
 order to obtain \emph{fast} versions of the Brzozowski's, and Sulzmann
 and Lu's algorithms.  This is where the PhD-project aims to advance the
 as the simplifications of $\ZERO + r$, $r + \ZERO$, $\ONE\cdot r$, $r
 \cdot \ONE$, and $r + r$ to just $r$. These simplifications do not
 affect the answer for whether a regular expression matches a string or
 not, but fortunately also do not affect the POSIX strategy of how
 regular expressions match strings---although the latter is much harder
-to establish. \comment{Does not make sense.} The argument for
+to establish. Some initial results in this regard have been
-complicating the data structures from basic regular expressions to those
-with bitcodes is that we can introduce simplification without making the
-algorithm crash or overly complex to reason about. The latter is crucial
-for a correctness proof. Some initial results in this regard have been
 obtained in \cite{AusafDyckhoffUrban2016}.
 Unfortunately, the simplification rules outlined above  are not
 sufficient to prevent an explosion for all regular expression. We
 believe a tighter bound can be achieved that prevents an explosion in
 \noindent
 A partial derivative of a regular expression $r$ is essentially a set of
 regular expressions that are either $r$'s children expressions or a
 concatenation of them. Antimirov has proved a tight bound of the size of
-partial derivatives. \comment{That looks too preliminary to me.} Roughly
+partial derivatives. Roughly
-speaking the size will not exceed the fourth power of the number of
+speaking the size will not exceed $t^2$, t is the number of characters
-nodes in that regular expression. \comment{Improve: which
+appearing in that regular expression. If we want the size of derivatives
-simplifications?}Interestingly, we observed from experiment that after
+to stay below this bound, we would need more aggressive simplifications
-the simplification step, our regular expression has the same size or is
+such as opening up alternatives to achieve the maximum level of duplicates
-smaller than the partial derivatives. This allows us to prove a tight
+cancellation.
-bound on the size of regular expression during the running time of the
+For example, the parentheses in $(a+b) \cdot c + bc$ can be opened up to
-algorithm if we can establish the connection between our simplification
+get $a\cdot c +b \cdot c
-rules and partial derivatives.
++ b \cdot c$, and then simplified to $a \cdot c+b \cdot c$. Another example is from
+$(a^*+a) + (a^*+ 1) + (a +1)$ to $a^*+a+1$.
+Adding these more aggressive simplification rules
+helped us to achieve the same size bound as that of the partial derivatives.
+To introduce these "spilling out alternatives" simplifications
+and make the correctness proof easier,
+we used bitcodes.
+%This allows us to prove a tight
+%bound on the size of regular expression during the running time of the
+%algorithm if we can establish the connection between our simplification
+%rules and partial derivatives.
 %We believe, and have generated test
 %data, that a similar bound can be obtained for the derivatives in
 %Sulzmann and Lu's algorithm. Let us give some details about this next.
-Bit-codes look like this:
+Bitcodes look like this:
 \begin{center}
 		$b ::=   S \mid  Z \; \;\;
 bs ::= [] \mid b:bs
 $
 \end{center}
-They are just a string of bits, the names $S$ and $Z$  here are quite arbitrary, we can use 0 and 1 or any other set of binary symbols to substitute them. Bit-codes(or bit-sequences) are a compact form of parse trees.
+They are just a string of bits,
-Here is how values and bit-codes are related:
+the names $S$ and $Z$  here are quite arbitrary, we can use 0 and 1
+or any other set of binary symbols to substitute them.
+Bitcodes(or bit-sequences) are a compact form of parse trees.
 Bitcodes are essentially incomplete values.
 This can be straightforwardly seen in the following transformation:
 \begin{center}
 \begin{tabular}{lcl}
 $\textit{code}(\Empty)$ & $\dn$ & $[]$\\
 \textit{else}\;\textit{None}$
 \end{tabular}
 \end{center}
 %\end{definition}
+Sulzmann and Lu's integrated the bitcodes into regular
-Sulzmann and Lu's integrated the bitcodes into annotated regular
+expressions to create annotated regular expressions.
-expressions by attaching them to the head of every substructure of a
+It is by attaching them to the head of every substructure of a
-regular expression\cite{Sulzmann2014}. They are defined by the following
+regular expression\cite{Sulzmann2014}. Annotated regular expressions
+are defined by the following
 grammar:
 \begin{center}
 \begin{tabular}{lcl}
 $\textit{a}$ & $::=$  & $\textit{ZERO}$\\
 & $\mid$ & $\textit{ONE}\;\;bs$\\
 & $\mid$ & $\textit{CHAR}\;\;bs\,c$\\
-& $\mid$ & $\textit{ALTS}\;\;bs\,as$\\
+& $\mid$ & $\textit{ALT}\;\;bs\,as$\\
 & $\mid$ & $\textit{SEQ}\;\;bs\,a_1\,a_2$\\
 & $\mid$ & $\textit{STAR}\;\;bs\,a$
 \end{tabular}
 \end{center}
 \noindent
-where $bs$ stands for bitsequences, and $as$ (in \textit{ALTS}) for a
+where $bs$ stands for bit-sequences, and $as$ (in \textit{ALTS}) for a
-list of annotated regular expressions. These bitsequences encode
+list of annotated regular expressions. These bit-sequences encode
 information about the (POSIX) value that should be generated by the
 Sulzmann and Lu algorithm.
 To do lexing using annotated regular expressions, we shall first
 transform the usual (un-annotated) regular expressions into annotated
 \end{tabular}
 \end{center}
 Here $(r^\uparrow)\backslash s$ is similar to what we have previously defined for
 $r\backslash s$.
-The main point of the bitsequences and annotated regular expressions
+The main point of the bit-sequences and annotated regular expressions
 is that we can apply rather aggressive (in terms of size)
 simplification rules in order to keep derivatives small.
 We have
 developed such ``aggressive'' simplification rules and generated test
 $\textit{flatten} \; a :: as'$ & $\dn$ & $a :: \textit{flatten} \; as'$ \quad(otherwise)
 \end{tabular}
 \end{center}
 \noindent
-\comment{No: functional flatten  does not remove ZEROs}Here flatten behaves like the traditional functional programming flatten function,
+Here flatten behaves like the traditional functional programming flatten function, except that it also removes $\ZERO$s.
-what it does is basically removing parentheses like changing $a+(b+c)$ into $a+b+c$.
+What it does is basically removing parentheses like changing $a+(b+c)$ into $a+b+c$.
 Suppose we apply simplification after each derivative step,
 and view these two operations as an atomic one: $a \backslash_{simp} c \dn \textit{simp}(a \backslash c)$.
 Then we can use the previous natural extension from derivative w.r.t   character to derivative w.r.t string:
 have to work out all proof details.''
 \end{quote}
 \noindent
 We would settle the correctness claim.
-It is relatively straightforward to establish that after one simplification step, the part of derivative that corresponds to a POSIX value remains intact and can still be collected, in other words,
+It is relatively straightforward to establish that after one simplification step,
-\comment{that  only holds when r is nullable}bmkeps r = bmkeps simp r
+the part of a nullable derivative that corresponds to a POSIX value
-as this basically comes down to proving actions like removing the additional $r$ in $r+r$  does not delete important POSIX information in a regular expression.
+remains intact and can still be collected,
+in other words,
+\begin{center}
+$\textit{bmkeps} \; r = \textit{bmkeps} \; \textit{simp} \; r\;( r\; \textit{nullable})$
+\end{center}
+as this basically comes down to proving actions like removing
+the additional $r$ in $r+r$  does not delete important POSIX information
+in a regular expression.
 The hardcore of this problem is to prove that
-bmkeps bders r = bmkeps bders simp r
+\begin{center}
-That is, if we do derivative on regular expression r and the simplified version for, they can still prove the same POSIX value if there is one . This is not as straightforward as the previous proposition, as the two regular expression r and simp r  might become very different regular expressions after repeated application ofd simp and derivative.
+$\textit{bmkeps} \; \textit{blexer}\_{simp} \; r = \textit{bmkeps} \; \textit{blexer} \; \textit{simp} \; r$
-The crucial point is to find the \comment{What?}"gene" of a regular expression and how it is kept intact during simplification.
+\end{center}
-To aid this, we are use the helping function retrieve described by Sulzmann and Lu:
+That is, if we do derivative on regular expression r and the simplified version,
+they can still provide the same POSIX value if there is one .
+This is not as straightforward as the previous proposition, as the two regular expressions $r$ and $\textit{simp}\; r$
+might become very different regular expressions after repeated application of $\textit{simp}$ and derivative.
+The crucial point is to find the indispensable information of
+a regular expression and how it is kept intact during simplification so that it performs
+as good as a regular expression that has not been simplified in the subsequent derivative operations.
+To aid this, we use the helping function retrieve described by Sulzmann and Lu:
 \\definition of retrieve\\
-This function assembled the bitcode that corresponds to a parse tree for how the current derivative matches the suffix of the string(the characters that have not yet appeared, but is stored in the value).
+This function assembles the bitcode that corresponds to a parse tree for how the current
+derivative matches the suffix of the string(the characters that have not yet appeared, but is stored in the value).
 Sulzmann and Lu used this to connect the bit-coded algorithm to the older algorithm by the following equation:\\
 $inj \;a\; c \; v = \textit{decode} \; (\textit{retrieve}\; ((\textit{internalise}\; r)\backslash_{simp} c) v)$\\
 A little fact that needs to be stated to help comprehension:\\
 $r^\uparrow = a$($a$ stands for $annotated$).\\
 Fahad and Christian also used this fact to prove  the correctness of bit-coded algorithm without simplification.
 Our purpose of using this, however, is try to establish \\
 $ \textit{retrieve} \; a \; v \;=\; \textit{retrieve}  \; \textit{simp}(a) \; v'.$\\
 The idea is that using $v'$,
-a simplified version of $v$ that possibly had gone through the same simplification step as $\textit{simp}(a)$ we are still  able to extract the bitsequence that gives the same parsing information as the unsimplified one.
+a simplified version of $v$ that possibly had gone through the same simplification step as $\textit{simp}(a)$ we are still  able to extract the bit-sequence that gives the same parsing information as the unsimplified one.
 After establishing this, we might be able to finally bridge the gap of proving\\
 $\textit{retrieve} \; r   \backslash  s \; v = \;\textit{retrieve} \; \textit{simp}(r)  \backslash  s \; v'$\\
 and subsequently\\
 $\textit{retrieve} \; r \backslash  s \; v\; = \; \textit{retrieve} \; r  \backslash_{simp}   s \; v'$.\\
 This proves that our simplified version of regular expression still contains all the bitcodes needed.

changeset 71	0573615e41a3
parent 70	cab5eab1f6f1
child 72	83b021fc7d29