diff -r 96e009a446d5 -r d50a309a0645 ChengsongTanPhdThesis/Chapters/Cubic.tex
--- a/ChengsongTanPhdThesis/Chapters/Cubic.tex	Thu Dec 01 08:49:19 2022 +0000
+++ b/ChengsongTanPhdThesis/Chapters/Cubic.tex	Mon Dec 05 17:39:10 2022 +0000
@@ -1,6 +1,18 @@
 % Chapter Template
 
-\chapter{A Better Bound and Other Extensions} % Main chapter title
+%We also present the idempotency property proof
+%of $\bsimp$, which leverages the idempotency proof of $\rsimp$.
+%This reinforces our claim that the fixpoint construction
+%originally required by Sulzmann and Lu can be removed in $\blexersimp$.
+%Last but not least, we present our efforts and challenges we met
+%in further improving the algorithm by data
+%structures such as zippers.
+%----------------------------------------------------------------------------------------
+%	SECTION strongsimp
+%----------------------------------------------------------------------------------------
+%TODO: search for isabelle proofs of algorithms that check equivalence 
+
+\chapter{A Better Size Bound for Derivatives} % Main chapter title
 
 \label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound
 %in Chapter 4 to a polynomial one, and demonstrate how one can extend the
@@ -11,10 +23,17 @@
 This chapter is a ``work-in-progress''
 chapter which records
 extensions to our $\blexersimp$.
-We intend to formalise this part, which
-we have not been able to finish due to time constraints of the PhD.
+We make a conjecture that the finite
+size bound from the previous chapter can be 
+improved to a cubic bound.
+We implemented our conjecture in Scala.
+We intend to formalise this part in Isabelle/HOL at a
+later stage.
+%we have not been able to finish due to time constraints of the PhD.
 Nevertheless, we outline the ideas we intend to use for the proof.
 
+\section{A Stronger Version of Simplification}
+
 We present further improvements
 for our lexer algorithm $\blexersimp$.
 We devise a stronger simplification algorithm,
@@ -24,41 +43,27 @@
 even if these regular expressions are not exactly the same.
 We call the lexer that uses this stronger simplification function
 $\blexerStrong$.
-Unfortunately we did not have time to 
-work out the proofs, like in
-the previous chapters.
+%Unfortunately we did not have time to 
+%work out the proofs, like in
+%the previous chapters.
 We conjecture that both
 \begin{center}
 	$\blexerStrong \;r \; s = \blexer\; r\;s$
 \end{center}
-and
+and 
 \begin{center}
 	$\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$
 \end{center}
-hold, but a formalisation
-is still future work.
+hold.
+%but a formalisation
+%is still future work.
 We give an informal justification 
 why the correctness and cubic size bound proofs
 can be achieved
 by exploring the connection between the internal
 data structure of our $\blexerStrong$ and
-Animirov's partial derivatives.\\
-%We also present the idempotency property proof
-%of $\bsimp$, which leverages the idempotency proof of $\rsimp$.
-%This reinforces our claim that the fixpoint construction
-%originally required by Sulzmann and Lu can be removed in $\blexersimp$.
+Animirov's partial derivatives.
 
-%Last but not least, we present our efforts and challenges we met
-%in further improving the algorithm by data
-%structures such as zippers.
-
-
-
-%----------------------------------------------------------------------------------------
-%	SECTION strongsimp
-%----------------------------------------------------------------------------------------
-\section{A Stronger Version of Simplification}
-%TODO: search for isabelle proofs of algorithms that check equivalence 
 In our bitcoded lexing algorithm, (sub)terms represent (sub)matches.
 For example, the regular expression 
 \[
@@ -76,32 +81,35 @@
 	$rs_a@[a_1]@rs_b@[a_2]@rs_c$
 \end{center}
 is that 
+the two erased regular expressions are equal
 \begin{center}
 	$\rerase{a_1} = \rerase{a_2}$.
 \end{center}
-It is characterised as the $LD$ 
-rewrite rule in figure \ref{rrewriteRules}.\\
-The problem , however, is that identical components
-in two slightly different regular expressions cannot be removed.
-Consider the simplification
+This is characterised as the $LD$ 
+rewrite rule in figure \ref{rrewriteRules}.
+The problem, however, is that identical components
+in two slightly different regular expressions cannot be removed
+by the $LD$ rule.
+Consider the stronger simplification
 \begin{equation}
 	\label{eqn:partialDedup}
 	(a+b+d) \cdot r_1 + (a+c+e) \cdot r_1 \stackrel{?}{\rightsquigarrow} (a+b+d) \cdot r_1 + (c+e) \cdot r_1
 \end{equation}
-where the $(a+\ldots)\cdot r_1$ is deleted in the right alternative.
+where the $(\underline{a}+c+e)\cdot r_1$ is deleted in the right alternative
+$a+c+e$.
 This is permissible because we have $(a+\ldots)\cdot r_1$ in the left
 alternative.
 The difficulty is that such  ``buried''
 alternatives-sequences are not easily recognised.
 But simplification like this actually
-cannot be omitted,
-as without it the size of derivatives can still
+cannot be omitted, if we want to have a better bound.
+For example, the size of derivatives can still
 blow up even with our $\textit{bsimp}$ 
 function: 
 consider again the example
 $\protect((a^* + (aa)^* + \ldots + (\underbrace{a\ldots a}_{n a's})^* )^*)^*$,
-and set $n$ to a relatively small number,
-we get exponential growth:
+and set $n$ to a relatively small number like $n=5$, then we get the following
+exponential growth:
 \begin{figure}[H]
 \centering
 \begin{tikzpicture}
@@ -114,7 +122,7 @@
 \addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data};
 \end{axis}
 \end{tikzpicture}
-\caption{Size of derivatives of $\blexersimp$ for matching 
+\caption{Size of derivatives of $\blexersimp$ from chapter 5 for matching 
 	$\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ 
 	with strings 
 	of the form $\protect\underbrace{aa..a}_{n}$.}\label{blexerExp}
@@ -126,10 +134,11 @@
 	(a+b+d) \cdot r_1  \longrightarrow a \cdot r_1 + b \cdot r_1 + d \cdot r_1
 \]
 \noindent
-in our $\simp$ function,
-so that it makes the simplification in \eqref{eqn:partialDedup} possible.
-Translating the rule into our $\textit{bsimp}$ function simply
-involves adding a new clause to the $\textit{bsimp}_{ASEQ}$ function:
+which pushes the sequence into the alternatives
+in our $\simp$ function.
+This would then make the simplification shown in \eqref{eqn:partialDedup} possible.
+Translating this rule into our $\textit{bsimp}$ function would simply
+involve adding a new clause to the $\textit{bsimp}_{ASEQ}$ function:
 \begin{center}
 	\begin{tabular}{@{}lcl@{}}
 		$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
@@ -141,7 +150,7 @@
 \end{center}
 \noindent
 Unfortunately,
-if we introduce them in our
+if we introduce this clause in our
 setting we would lose the POSIX property of our calculated values. 
 For example given the regular expression 
 \begin{center}
@@ -156,47 +165,53 @@
 Essentially it matches the string with the longer Right-alternative
 in the first sequence (and
 then the 'rest' with the character regular expression $c$ from the second sequence).
-If we add the simplification above, then we obtain the following value
+If we add the simplification above, however, 
+then we would obtain the following value
 \begin{center}
 	$\Left \; (\Seq \; a \; (\Left \; bc))$
 \end{center}
 where the $\Left$-alternatives get priority. 
-However this violates the POSIX rules.
+This violates the POSIX rules.
 The reason for getting this undesired value
 is that the new rule splits this regular expression up into 
+a topmost alternative
 \begin{center}
 	$a\cdot(b c + c) + ab \cdot (bc + c)$,
 \end{center}
-which becomes a regular expression with a 
-quite different structure--the original
-was a sequence, and now it becomes an alternative.
-With an alternative the maximal munch rule no longer works.\\
-A method to reconcile this is to do the 
+which is a regular expression with a 
+quite different meaning: the original regular expression
+is a sequence, but the simplified regular expression is an alternative.
+With an alternative the maximal munch rule no longer works.
+
+
+A method to reconcile this problem is to do the 
 transformation in \eqref{eqn:partialDedup} ``non-invasively'',
 meaning that we traverse the list of regular expressions
-\begin{center}
-	$rs_a@[a]@rs_c$
-\end{center}
-in the alternative
+%\begin{center}
+%	$rs_a@[a]@rs_c$
+%\end{center}
+inside alternatives
 \begin{center}
 	$\sum ( rs_a@[a]@rs_c)$
 \end{center}
 using  a function similar to $\distinctBy$,
 but this time 
-we allow a more general list rewrite:
-\begin{figure}[H]
-\begin{mathpar}	
+we allow the following more general rewrite rule:
+\begin{equation}
+\label{eqn:cubicRule}
+%\mbox{
+%\begin{mathpar}	
 	\inferrule * [Right = cubicRule]{\vspace{0mm} }{rs_a@[a]@rs_c 
 			\stackrel{s}{\rightsquigarrow }
 		rs_a@[\textit{prune} \; a \; rs_a]@rs_c }
-\end{mathpar}
-\caption{The rule capturing the pruning simplification needed to achieve
-a cubic bound}
-\label{fig:cubicRule}
-\end{figure}
+%\end{mathpar}
+%\caption{The rule capturing the pruning simplification needed to achieve
+%a cubic bound}
+\end{equation}
+\noindent
 %L \; a_1' = L \; a_1 \setminus (\cup_{a \in rs_a} L \; a)
 where $\textit{prune} \;a \; acc$ traverses $a$
-without altering the structure of $a$, removing components in $a$
+without altering the structure of $a$, but removing components in $a$
 that have appeared in the accumulator $acc$.
 For example
 \begin{center}
@@ -208,14 +223,20 @@
 \end{center}
 because $r_gr_d$ and 
 $r_hr_d$ are the only terms
-that have not appeared in the accumulator list 
+that do not appeared in the accumulator list 
 \begin{center}
 $[(r_a+r_b+r_c)r_d, (r_e+r_f)r_d]$.
 \end{center}
-We implemented 
-function $\textit{prune}$ in Scala:
-\begin{figure}[H]
-\begin{lstlisting}
+We implemented the
+function $\textit{prune}$ in Scala (see figure \ref{fig:pruneFunc})
+The function $\textit{prune}$ 
+is a stronger version of $\textit{distinctBy}$.
+It does not just walk through a list looking for exact duplicates,
+but prunes sub-expressions recursively.
+It manages proper contexts by the helper functions
+$\textit{removeSeqTail}$, $\textit{isOne}$ and $\textit{atMostEmpty}$.
+\begin{figure}%[H]
+\begin{lstlisting}[numbers=left]
     def prune(r: ARexp, acc: Set[Rexp]) : ARexp = r match{
       case AALTS(bs, rs) => rs.map(r => prune(r, acc)).filter(_ != AZERO) match
       {
@@ -239,17 +260,16 @@
       case r => if(acc(erase(r))) AZERO else r
     }
 \end{lstlisting}
-\caption{The function $\textit{prune}$ }
+\caption{The function $\textit{prune}$ is called recursively in the 
+alternative case (line 2) and in the sequence case (line 12).
+In the alternative case we keep all the accumulators the same, but
+in the sequence case we are making necessary changes to the accumulators 
+to allow correct de-duplication.}\label{fig:pruneFunc}
 \end{figure}
 \noindent
-The function $\textit{prune}$ 
-is a stronger version of $\textit{distinctBy}$.
-It does not just walk through a list looking for exact duplicates,
-but prunes sub-expressions recursively.
-It manages proper contexts by the helper functions
-$\textit{removeSeqTail}$, $\textit{isOne}$ and $\textit{atMostEmpty}$.
-\begin{figure}[H]
-\begin{lstlisting}
+
+\begin{figure}
+\begin{lstlisting}[numbers=left]
     def atMostEmpty(r: Rexp) : Boolean = r match {
       case ZERO => true
       case ONE => true
@@ -274,17 +294,14 @@
       else {
         r match {
           case SEQ(r1, r2) => 
-            if(r2 == tail) 
-              r1
-            else
-              ZERO
+            if(r2 == tail) r1 else ZERO
           case r => ZERO
         }
       }
 
 
 \end{lstlisting}
-\caption{The helper functions of $\textit{prune}$}
+\caption{The helper functions of $\textit{prune}$ XXX.}
 \end{figure}
 \noindent
 Suppose we feed 
@@ -295,7 +312,7 @@
 \begin{center}
 	$acc = \{a\cdot(d\cdot e),f\cdot (g \cdot (a \cdot (d \cdot e))) \}$
 \end{center}
-as the input for $\textit{prune}$.
+as the input into $\textit{prune}$.
 The end result will be
 \[
 	b\cdot(g\cdot(a\cdot(d\cdot e)))
@@ -322,17 +339,17 @@
 	\textit{removeSeqTail} \quad \;\; 
 	f\cdot(g\cdot (a \cdot(d\cdot e)))\quad  \;\; a \cdot(d\cdot e).
 \]
+
 The idea behind $\textit{removeSeqTail}$ is that
 when pruning recursively, we need to ``zoom in''
 to sub-expressions, and this ``zoom in'' needs to be performed
 on the
-accumulators as well, otherwise we will be comparing
-apples with oranges.
+accumulators as well, otherwise the deletion will not work.
 The sub-call 
 $\textit{prune} \;\; (\ONE+(f+b)\cdot g)\;\; \{\ONE, f\cdot g \}$
 is simpler, which will trigger the alternative clause, causing
 a pruning on each element in $(\ONE+(f+b)\cdot g)$,
-leaving us $b\cdot g$ only.
+leaving us with $b\cdot g$ only.
 
 Our new lexer with stronger simplification
 uses $\textit{prune}$ by making it 
@@ -342,7 +359,7 @@
 parts of a regular expression are pruned away.
 
 \begin{figure}[H]
-\begin{lstlisting}
+\begin{lstlisting}[numbers=left]
     def turnIntoTerms(r: Rexp): List[Rexp] = r match {
       case SEQ(r1, r2)  => 
         turnIntoTerms(r1).flatMap(r11 => furtherSEQ(r11, r2))
@@ -370,7 +387,7 @@
         }
 
 \end{lstlisting}
-\caption{A Stronger Version of $\textit{distinctBy}$}
+\caption{A Stronger Version of $\textit{distinctBy}$ XXX}
 \end{figure}
 \noindent
 Once a regular expression has been pruned,
@@ -506,11 +523,16 @@
 \end{figure}
 \noindent
 We call this lexer $\blexerStrong$.
-This version is able to drastically reduce the
-internal data structure size which 
-otherwise could
-trigger exponential behaviours in
+This version is able to reduce the
+size of the derivatives which 
+otherwise 
+triggered exponential behaviour in
 $\blexersimp$.
+Consider again the runtime for matching 
+$\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ with strings 
+of the form $\protect\underbrace{aa..a}_{n}$.
+They produce the folloiwng graphs ($\blexerStrong$ on the left-hand-side and
+$\blexersimp$ on the right-hand-side).
 \begin{figure}[H]
 \centering
 \begin{tabular}{@{}c@{\hspace{0mm}}c@{\hspace{0mm}}c@{}}
@@ -541,13 +563,14 @@
 \end{tikzpicture}\\
 \multicolumn{2}{l}{}
 \end{tabular}    
-\caption{Runtime for matching 
-	$\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ with strings 
-	   of the form $\protect\underbrace{aa..a}_{n}$.}\label{fig:aaaaaStarStar}
+\caption{}\label{fig:aaaaaStarStar}
 \end{figure}
 \noindent
-We would like to preserve the correctness like the one 
-we had for $\blexersimp$:
+We hope the correctness is preserved.
+%We would like to preserve the correctness like the one 
+%we had for $\blexersimp$:
+The proof idea is to preserve the key lemma in chapter \ref{Bitcoded2}
+such as in equation \eqref{eqn:cubicRule}.
 \begin{conjecture}\label{cubicConjecture}
 	$\blexerStrong \;r \; s = \blexer\; r\;s$
 \end{conjecture}
@@ -556,7 +579,7 @@
 chapter \ref{Bitcoded2} like
 $r \stackrel{*}{\rightsquigarrow} \textit{bsimp} \; r$
 with the new rewriting rule 
-shown in figure \ref{fig:cubicRule} .
+shown in figure \eqref{eqn:cubicRule} .
 
 In the next sub-section,
 we will describe why we 
@@ -571,18 +594,18 @@
 \subsection{Antimirov's partial derivatives}
 Partial derivatives were first introduced by 
 Antimirov \cite{Antimirov95}.
-Partial derivatives are very similar
+They are very similar
 to Brzozowski derivatives,
-but splits children of alternative regular expressions into 
+but split children of alternative regular expressions into 
 multiple independent terms. This means the output of
-partial derivatives become a 
-set of regular expressions:
+partial derivatives is a 
+set of regular expressions, defined as follows
 \begin{center}  
- 	\begin{tabular}{lcl}
+	\begin{tabular}{lcl@{\hspace{-5mm}}l}
 		$\partial_x \; (r_1 \cdot r_2)$ & 
 		$\dn$ & $(\partial_x \; r_1) \cdot r_2 \cup
-		\partial_x \; r_2 \; \textit{if} \; \; \nullable\; r_1$\\
-		      & & $(\partial_x \; r_1)\cdot r_2 \quad\quad 
+		\partial_x \; r_2 \;$ & $ \textit{if} \; \; \nullable\; r_1$\\
+				      & & $(\partial_x \; r_1)\cdot r_2 \quad\quad$ & $ 
 		      \textit{otherwise}$\\ 
 		$\partial_x \; r^*$ & $\dn$ & $(\partial_x \; r) \cdot r^*$\\
 		$\partial_x \; c $ & $\dn$ & $\textit{if} \; x = c \; 
@@ -594,7 +617,7 @@
 	\end{tabular}
 \end{center}
 \noindent
-The $\cdot$ between for example 
+The $\cdot$ in the example 
 $(\partial_x \; r_1) \cdot r_2 $ 
 is a shorthand notation for the cartesian product 
 $(\partial_x \; r_1) \times \{ r_2\}$.
@@ -604,9 +627,10 @@
 Rather than joining the calculated derivatives $\partial_x r_1$ and $\partial_x r_2$ together
 using the $\sum$ constructor, Antimirov put them into
 a set.  This means many subterms will be de-duplicated
-because they are sets, 
-For example, to compute what regular expression $x^*(xx + y)^*$'s 
-derivative w.r.t. $x$ is, one can compute a partial derivative
+because they are sets.
+For example, to compute what the derivative of the regular expression 
+$x^*(xx + y)^*$
+w.r.t. $x$ is, one can compute a partial derivative
 and get two singleton sets $\{x^* \cdot (xx + y)^*\}$ and $\{x \cdot (xx + y) ^* \}$
 from $\partial_x(x^*) \cdot (xx + y) ^*$ and $\partial_x((xx + y)^*)$.
 
@@ -615,47 +639,51 @@
 	\partial_{c::cs} r \dn \bigcup_{r'\in (\partial_c r)}
 	\partial_{cs} r'
 \]
-Given an alphabet $\Sigma$, we denote the set of all possible strings
-from this alphabet as $\Sigma^*$. 
-The set of all possible partial derivatives is defined
-as the union of derivatives w.r.t all the strings in the universe:
+Suppose an alphabet $\Sigma$, we use $\Sigma^*$ for the set of all possible strings
+from the alphabet. 
+The set of all possible partial derivatives is then defined
+as the union of derivatives w.r.t all the strings: 
 \begin{center}
 	\begin{tabular}{lcl}
 		$\textit{PDER}_{\Sigma^*} \; r $ & $\dn $ & $\bigcup_{w \in \Sigma^*}\partial_w \; r$
 	\end{tabular}
 \end{center}
 \noindent
-Consider now again our pathological case where the derivatives
-grow with a rather aggressive simplification
+Consider now again our pathological case where we apply the more
+aggressive 
+simplification
 \begin{center}
 	$((a^* + (aa)^* + \ldots + (\underbrace{a\ldots a}_{n a's})^* )^*)^*$
 \end{center}
-example, if we denote this regular expression as $r$,
-we have that
+let use abbreviate theis regular expression with $r$,
+then we have that
 \begin{center}
 $\textit{PDER}_{\Sigma^*} \; r = 
 \bigcup_{i=1}^{n}\bigcup_{j=0}^{i-1} \{ 
 	(\underbrace{a \ldots a}_{\text{j a's}}\cdot
 (\underbrace{a \ldots a}_{\text{i a's}})^*)\cdot r \}$, 
 \end{center}
-with exactly $n * (n + 1) / 2$ terms.
-This is in line with our speculation that only $n*(n+1)/2$ terms are
-needed. We conjecture that $\bsimpStrong$ is also able to achieve this
+The union on the right-hand-side has $n * (n + 1) / 2$ terms.
+This leads us to believe that the maximum number of terms needed
+in our derivative is also only $n * (n + 1) / 2$.
+Therefore
+we conjecture that $\bsimpStrong$ is also able to achieve this
 upper limit in general
 \begin{conjecture}\label{bsimpStrongInclusionPder}
-	Using a suitable transformation $f$, we have 
+	Using a suitable transformation $f$, we have that
 	\begin{center}
 		$\forall s.\; f \; (r \bdersStrong  \; s) \subseteq
 		 \textit{PDER}_{\Sigma^*} \; r$
 	\end{center}
+	holds.
 \end{conjecture}
 \noindent
-because our \ref{fig:cubicRule} will keep only one copy of each term,
+The reason is that our \eqref{eqn:cubicRule} will keep only one copy of each term,
 where the function $\textit{prune}$ takes care of maintaining
 a set like structure similar to partial derivatives.
-We might need to adjust $\textit{prune}$
-slightly to make sure all duplicate terms are eliminated,
-which should be doable.
+%We might need to adjust $\textit{prune}$
+%slightly to make sure all duplicate terms are eliminated,
+%which should be doable.
 
 Antimirov had proven that the sum of all the partial derivative 
 terms' sizes is bounded by the cubic of the size of that regular
@@ -663,10 +691,11 @@
 \begin{property}\label{pderBound}
 	$\llbracket \textit{PDER}_{\Sigma^*} \; r \rrbracket \leq O(\llbracket r \rrbracket^3)$
 \end{property}
+\noindent
 This property was formalised by Wu et al. \cite{Wu2014}, and the 
-details can be found in the archive of formal proofs. \footnote{https://www.isa-afp.org/entries/Myhill-Nerode.html}
+details can be found in the Archive of Formal Froofs\footnote{https://www.isa-afp.org/entries/Myhill-Nerode.html}.
 Once conjecture \ref{bsimpStrongInclusionPder} is proven, then property \ref{pderBound}
-would yield us also a cubic bound for our $\blexerStrong$ algorithm: 
+would provide us with a cubic bound for our $\blexerStrong$ algorithm: 
 \begin{conjecture}\label{strongCubic}
 	$\llbracket r \bdersStrong\; s \rrbracket \leq \llbracket r \rrbracket^3$
 \end{conjecture}