--- a/ninems/ninems.tex Mon Jul 08 11:10:32 2019 +0100
+++ b/ninems/ninems.tex Mon Jul 08 21:21:54 2019 +0100
@@ -552,7 +552,8 @@
$v_i$ for $r_i = r_a \cdot r_b$ should be $\Seq\,(\inj\,r_a\,c\,v_a)\,v_b$.
Other clauses can be understood in a similar way.
-The following example gives a \comment{Other word: insight?}taste of $\textit{inj}$'s effect
+%\comment{Other word: insight?}
+The following example gives an insight of $\textit{inj}$'s effect
and how Sulzmann and Lu's algorithm works as a whole.
Suppose we have a
regular expression $((((a+b)+ab)+c)+abc)^*$, and want to match it against
@@ -674,7 +675,7 @@
when finished with derivatives and call $mkeps$ for deciding which
branch is POSIX, we can generate the parse tree in one pass, instead of
doing the rest $n$ injections. This leads to Sulzmann and Lu's novel
-idea of using bit-codes in derivatives.
+idea of using bitcodes in derivatives.
In the next section, we shall focus on the bit-coded algorithm and the
process of simplification of regular expressions. This is needed in
@@ -712,11 +713,7 @@
affect the answer for whether a regular expression matches a string or
not, but fortunately also do not affect the POSIX strategy of how
regular expressions match strings---although the latter is much harder
-to establish. \comment{Does not make sense.} The argument for
-complicating the data structures from basic regular expressions to those
-with bitcodes is that we can introduce simplification without making the
-algorithm crash or overly complex to reason about. The latter is crucial
-for a correctness proof. Some initial results in this regard have been
+to establish. Some initial results in this regard have been
obtained in \cite{AusafDyckhoffUrban2016}.
Unfortunately, the simplification rules outlined above are not
@@ -745,28 +742,41 @@
A partial derivative of a regular expression $r$ is essentially a set of
regular expressions that are either $r$'s children expressions or a
concatenation of them. Antimirov has proved a tight bound of the size of
-partial derivatives. \comment{That looks too preliminary to me.} Roughly
-speaking the size will not exceed the fourth power of the number of
-nodes in that regular expression. \comment{Improve: which
-simplifications?}Interestingly, we observed from experiment that after
-the simplification step, our regular expression has the same size or is
-smaller than the partial derivatives. This allows us to prove a tight
-bound on the size of regular expression during the running time of the
-algorithm if we can establish the connection between our simplification
-rules and partial derivatives.
+partial derivatives. Roughly
+speaking the size will not exceed $t^2$, t is the number of characters
+appearing in that regular expression. If we want the size of derivatives
+to stay below this bound, we would need more aggressive simplifications
+such as opening up alternatives to achieve the maximum level of duplicates
+cancellation.
+For example, the parentheses in $(a+b) \cdot c + bc$ can be opened up to
+get $a\cdot c +b \cdot c
++ b \cdot c$, and then simplified to $a \cdot c+b \cdot c$. Another example is from
+$(a^*+a) + (a^*+ 1) + (a +1)$ to $a^*+a+1$.
+Adding these more aggressive simplification rules
+ helped us to achieve the same size bound as that of the partial derivatives.
+To introduce these "spilling out alternatives" simplifications
+ and make the correctness proof easier,
+we used bitcodes.
+
+%This allows us to prove a tight
+%bound on the size of regular expression during the running time of the
+%algorithm if we can establish the connection between our simplification
+%rules and partial derivatives.
%We believe, and have generated test
%data, that a similar bound can be obtained for the derivatives in
%Sulzmann and Lu's algorithm. Let us give some details about this next.
-Bit-codes look like this:
+Bitcodes look like this:
\begin{center}
$b ::= S \mid Z \; \;\;
bs ::= [] \mid b:bs
$
\end{center}
-They are just a string of bits, the names $S$ and $Z$ here are quite arbitrary, we can use 0 and 1 or any other set of binary symbols to substitute them. Bit-codes(or bit-sequences) are a compact form of parse trees.
-Here is how values and bit-codes are related:
+They are just a string of bits,
+the names $S$ and $Z$ here are quite arbitrary, we can use 0 and 1
+or any other set of binary symbols to substitute them.
+Bitcodes(or bit-sequences) are a compact form of parse trees.
Bitcodes are essentially incomplete values.
This can be straightforwardly seen in the following transformation:
\begin{center}
@@ -825,10 +835,11 @@
\end{center}
%\end{definition}
-
-Sulzmann and Lu's integrated the bitcodes into annotated regular
-expressions by attaching them to the head of every substructure of a
-regular expression\cite{Sulzmann2014}. They are defined by the following
+Sulzmann and Lu's integrated the bitcodes into regular
+expressions to create annotated regular expressions.
+It is by attaching them to the head of every substructure of a
+regular expression\cite{Sulzmann2014}. Annotated regular expressions
+ are defined by the following
grammar:
\begin{center}
@@ -836,15 +847,15 @@
$\textit{a}$ & $::=$ & $\textit{ZERO}$\\
& $\mid$ & $\textit{ONE}\;\;bs$\\
& $\mid$ & $\textit{CHAR}\;\;bs\,c$\\
- & $\mid$ & $\textit{ALTS}\;\;bs\,as$\\
+ & $\mid$ & $\textit{ALT}\;\;bs\,as$\\
& $\mid$ & $\textit{SEQ}\;\;bs\,a_1\,a_2$\\
& $\mid$ & $\textit{STAR}\;\;bs\,a$
\end{tabular}
\end{center}
\noindent
-where $bs$ stands for bitsequences, and $as$ (in \textit{ALTS}) for a
-list of annotated regular expressions. These bitsequences encode
+where $bs$ stands for bit-sequences, and $as$ (in \textit{ALTS}) for a
+list of annotated regular expressions. These bit-sequences encode
information about the (POSIX) value that should be generated by the
Sulzmann and Lu algorithm.
@@ -957,7 +968,7 @@
Here $(r^\uparrow)\backslash s$ is similar to what we have previously defined for
$r\backslash s$.
-The main point of the bitsequences and annotated regular expressions
+The main point of the bit-sequences and annotated regular expressions
is that we can apply rather aggressive (in terms of size)
simplification rules in order to keep derivatives small.
@@ -1014,8 +1025,8 @@
\end{center}
\noindent
- \comment{No: functional flatten does not remove ZEROs}Here flatten behaves like the traditional functional programming flatten function,
- what it does is basically removing parentheses like changing $a+(b+c)$ into $a+b+c$.
+ Here flatten behaves like the traditional functional programming flatten function, except that it also removes $\ZERO$s.
+ What it does is basically removing parentheses like changing $a+(b+c)$ into $a+b+c$.
Suppose we apply simplification after each derivative step,
and view these two operations as an atomic one: $a \backslash_{simp} c \dn \textit{simp}(a \backslash c)$.
@@ -1074,16 +1085,31 @@
\noindent
We would settle the correctness claim.
-It is relatively straightforward to establish that after one simplification step, the part of derivative that corresponds to a POSIX value remains intact and can still be collected, in other words,
-\comment{that only holds when r is nullable}bmkeps r = bmkeps simp r
-as this basically comes down to proving actions like removing the additional $r$ in $r+r$ does not delete important POSIX information in a regular expression.
+It is relatively straightforward to establish that after one simplification step,
+the part of a nullable derivative that corresponds to a POSIX value
+remains intact and can still be collected,
+in other words,
+\begin{center}
+$\textit{bmkeps} \; r = \textit{bmkeps} \; \textit{simp} \; r\;( r\; \textit{nullable})$
+\end{center}
+as this basically comes down to proving actions like removing
+the additional $r$ in $r+r$ does not delete important POSIX information
+in a regular expression.
The hardcore of this problem is to prove that
-bmkeps bders r = bmkeps bders simp r
-That is, if we do derivative on regular expression r and the simplified version for, they can still prove the same POSIX value if there is one . This is not as straightforward as the previous proposition, as the two regular expression r and simp r might become very different regular expressions after repeated application ofd simp and derivative.
-The crucial point is to find the \comment{What?}"gene" of a regular expression and how it is kept intact during simplification.
-To aid this, we are use the helping function retrieve described by Sulzmann and Lu:
+\begin{center}
+$\textit{bmkeps} \; \textit{blexer}\_{simp} \; r = \textit{bmkeps} \; \textit{blexer} \; \textit{simp} \; r$
+\end{center}
+That is, if we do derivative on regular expression r and the simplified version,
+they can still provide the same POSIX value if there is one .
+This is not as straightforward as the previous proposition, as the two regular expressions $r$ and $\textit{simp}\; r$
+ might become very different regular expressions after repeated application of $\textit{simp}$ and derivative.
+The crucial point is to find the indispensable information of
+a regular expression and how it is kept intact during simplification so that it performs
+as good as a regular expression that has not been simplified in the subsequent derivative operations.
+To aid this, we use the helping function retrieve described by Sulzmann and Lu:
\\definition of retrieve\\
- This function assembled the bitcode that corresponds to a parse tree for how the current derivative matches the suffix of the string(the characters that have not yet appeared, but is stored in the value).
+ This function assembles the bitcode that corresponds to a parse tree for how the current
+ derivative matches the suffix of the string(the characters that have not yet appeared, but is stored in the value).
Sulzmann and Lu used this to connect the bit-coded algorithm to the older algorithm by the following equation:\\
$inj \;a\; c \; v = \textit{decode} \; (\textit{retrieve}\; ((\textit{internalise}\; r)\backslash_{simp} c) v)$\\
A little fact that needs to be stated to help comprehension:\\
@@ -1092,7 +1118,7 @@
Our purpose of using this, however, is try to establish \\
$ \textit{retrieve} \; a \; v \;=\; \textit{retrieve} \; \textit{simp}(a) \; v'.$\\
The idea is that using $v'$,
- a simplified version of $v$ that possibly had gone through the same simplification step as $\textit{simp}(a)$ we are still able to extract the bitsequence that gives the same parsing information as the unsimplified one.
+ a simplified version of $v$ that possibly had gone through the same simplification step as $\textit{simp}(a)$ we are still able to extract the bit-sequence that gives the same parsing information as the unsimplified one.
After establishing this, we might be able to finally bridge the gap of proving\\
$\textit{retrieve} \; r \backslash s \; v = \;\textit{retrieve} \; \textit{simp}(r) \backslash s \; v'$\\
and subsequently\\