diff -r 96e009a446d5 -r d50a309a0645 ChengsongTanPhdThesis/Chapters/Cubic.tex --- a/ChengsongTanPhdThesis/Chapters/Cubic.tex Thu Dec 01 08:49:19 2022 +0000 +++ b/ChengsongTanPhdThesis/Chapters/Cubic.tex Mon Dec 05 17:39:10 2022 +0000 @@ -1,6 +1,18 @@ % Chapter Template -\chapter{A Better Bound and Other Extensions} % Main chapter title +%We also present the idempotency property proof +%of $\bsimp$, which leverages the idempotency proof of $\rsimp$. +%This reinforces our claim that the fixpoint construction +%originally required by Sulzmann and Lu can be removed in $\blexersimp$. +%Last but not least, we present our efforts and challenges we met +%in further improving the algorithm by data +%structures such as zippers. +%---------------------------------------------------------------------------------------- +% SECTION strongsimp +%---------------------------------------------------------------------------------------- +%TODO: search for isabelle proofs of algorithms that check equivalence + +\chapter{A Better Size Bound for Derivatives} % Main chapter title \label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound %in Chapter 4 to a polynomial one, and demonstrate how one can extend the @@ -11,10 +23,17 @@ This chapter is a ``work-in-progress'' chapter which records extensions to our $\blexersimp$. -We intend to formalise this part, which -we have not been able to finish due to time constraints of the PhD. +We make a conjecture that the finite +size bound from the previous chapter can be +improved to a cubic bound. +We implemented our conjecture in Scala. +We intend to formalise this part in Isabelle/HOL at a +later stage. +%we have not been able to finish due to time constraints of the PhD. Nevertheless, we outline the ideas we intend to use for the proof. +\section{A Stronger Version of Simplification} + We present further improvements for our lexer algorithm $\blexersimp$. We devise a stronger simplification algorithm, @@ -24,41 +43,27 @@ even if these regular expressions are not exactly the same. We call the lexer that uses this stronger simplification function $\blexerStrong$. -Unfortunately we did not have time to -work out the proofs, like in -the previous chapters. +%Unfortunately we did not have time to +%work out the proofs, like in +%the previous chapters. We conjecture that both \begin{center} $\blexerStrong \;r \; s = \blexer\; r\;s$ \end{center} -and +and \begin{center} $\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$ \end{center} -hold, but a formalisation -is still future work. +hold. +%but a formalisation +%is still future work. We give an informal justification why the correctness and cubic size bound proofs can be achieved by exploring the connection between the internal data structure of our $\blexerStrong$ and -Animirov's partial derivatives.\\ -%We also present the idempotency property proof -%of $\bsimp$, which leverages the idempotency proof of $\rsimp$. -%This reinforces our claim that the fixpoint construction -%originally required by Sulzmann and Lu can be removed in $\blexersimp$. +Animirov's partial derivatives. -%Last but not least, we present our efforts and challenges we met -%in further improving the algorithm by data -%structures such as zippers. - - - -%---------------------------------------------------------------------------------------- -% SECTION strongsimp -%---------------------------------------------------------------------------------------- -\section{A Stronger Version of Simplification} -%TODO: search for isabelle proofs of algorithms that check equivalence In our bitcoded lexing algorithm, (sub)terms represent (sub)matches. For example, the regular expression \[ @@ -76,32 +81,35 @@ $rs_a@[a_1]@rs_b@[a_2]@rs_c$ \end{center} is that +the two erased regular expressions are equal \begin{center} $\rerase{a_1} = \rerase{a_2}$. \end{center} -It is characterised as the $LD$ -rewrite rule in figure \ref{rrewriteRules}.\\ -The problem , however, is that identical components -in two slightly different regular expressions cannot be removed. -Consider the simplification +This is characterised as the $LD$ +rewrite rule in figure \ref{rrewriteRules}. +The problem, however, is that identical components +in two slightly different regular expressions cannot be removed +by the $LD$ rule. +Consider the stronger simplification \begin{equation} \label{eqn:partialDedup} (a+b+d) \cdot r_1 + (a+c+e) \cdot r_1 \stackrel{?}{\rightsquigarrow} (a+b+d) \cdot r_1 + (c+e) \cdot r_1 \end{equation} -where the $(a+\ldots)\cdot r_1$ is deleted in the right alternative. +where the $(\underline{a}+c+e)\cdot r_1$ is deleted in the right alternative +$a+c+e$. This is permissible because we have $(a+\ldots)\cdot r_1$ in the left alternative. The difficulty is that such ``buried'' alternatives-sequences are not easily recognised. But simplification like this actually -cannot be omitted, -as without it the size of derivatives can still +cannot be omitted, if we want to have a better bound. +For example, the size of derivatives can still blow up even with our $\textit{bsimp}$ function: consider again the example $\protect((a^* + (aa)^* + \ldots + (\underbrace{a\ldots a}_{n a's})^* )^*)^*$, -and set $n$ to a relatively small number, -we get exponential growth: +and set $n$ to a relatively small number like $n=5$, then we get the following +exponential growth: \begin{figure}[H] \centering \begin{tikzpicture} @@ -114,7 +122,7 @@ \addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data}; \end{axis} \end{tikzpicture} -\caption{Size of derivatives of $\blexersimp$ for matching +\caption{Size of derivatives of $\blexersimp$ from chapter 5 for matching $\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ with strings of the form $\protect\underbrace{aa..a}_{n}$.}\label{blexerExp} @@ -126,10 +134,11 @@ (a+b+d) \cdot r_1 \longrightarrow a \cdot r_1 + b \cdot r_1 + d \cdot r_1 \] \noindent -in our $\simp$ function, -so that it makes the simplification in \eqref{eqn:partialDedup} possible. -Translating the rule into our $\textit{bsimp}$ function simply -involves adding a new clause to the $\textit{bsimp}_{ASEQ}$ function: +which pushes the sequence into the alternatives +in our $\simp$ function. +This would then make the simplification shown in \eqref{eqn:partialDedup} possible. +Translating this rule into our $\textit{bsimp}$ function would simply +involve adding a new clause to the $\textit{bsimp}_{ASEQ}$ function: \begin{center} \begin{tabular}{@{}lcl@{}} $\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\ @@ -141,7 +150,7 @@ \end{center} \noindent Unfortunately, -if we introduce them in our +if we introduce this clause in our setting we would lose the POSIX property of our calculated values. For example given the regular expression \begin{center} @@ -156,47 +165,53 @@ Essentially it matches the string with the longer Right-alternative in the first sequence (and then the 'rest' with the character regular expression $c$ from the second sequence). -If we add the simplification above, then we obtain the following value +If we add the simplification above, however, +then we would obtain the following value \begin{center} $\Left \; (\Seq \; a \; (\Left \; bc))$ \end{center} where the $\Left$-alternatives get priority. -However this violates the POSIX rules. +This violates the POSIX rules. The reason for getting this undesired value is that the new rule splits this regular expression up into +a topmost alternative \begin{center} $a\cdot(b c + c) + ab \cdot (bc + c)$, \end{center} -which becomes a regular expression with a -quite different structure--the original -was a sequence, and now it becomes an alternative. -With an alternative the maximal munch rule no longer works.\\ -A method to reconcile this is to do the +which is a regular expression with a +quite different meaning: the original regular expression +is a sequence, but the simplified regular expression is an alternative. +With an alternative the maximal munch rule no longer works. + + +A method to reconcile this problem is to do the transformation in \eqref{eqn:partialDedup} ``non-invasively'', meaning that we traverse the list of regular expressions -\begin{center} - $rs_a@[a]@rs_c$ -\end{center} -in the alternative +%\begin{center} +% $rs_a@[a]@rs_c$ +%\end{center} +inside alternatives \begin{center} $\sum ( rs_a@[a]@rs_c)$ \end{center} using a function similar to $\distinctBy$, but this time -we allow a more general list rewrite: -\begin{figure}[H] -\begin{mathpar} +we allow the following more general rewrite rule: +\begin{equation} +\label{eqn:cubicRule} +%\mbox{ +%\begin{mathpar} \inferrule * [Right = cubicRule]{\vspace{0mm} }{rs_a@[a]@rs_c \stackrel{s}{\rightsquigarrow } rs_a@[\textit{prune} \; a \; rs_a]@rs_c } -\end{mathpar} -\caption{The rule capturing the pruning simplification needed to achieve -a cubic bound} -\label{fig:cubicRule} -\end{figure} +%\end{mathpar} +%\caption{The rule capturing the pruning simplification needed to achieve +%a cubic bound} +\end{equation} +\noindent %L \; a_1' = L \; a_1 \setminus (\cup_{a \in rs_a} L \; a) where $\textit{prune} \;a \; acc$ traverses $a$ -without altering the structure of $a$, removing components in $a$ +without altering the structure of $a$, but removing components in $a$ that have appeared in the accumulator $acc$. For example \begin{center} @@ -208,14 +223,20 @@ \end{center} because $r_gr_d$ and $r_hr_d$ are the only terms -that have not appeared in the accumulator list +that do not appeared in the accumulator list \begin{center} $[(r_a+r_b+r_c)r_d, (r_e+r_f)r_d]$. \end{center} -We implemented -function $\textit{prune}$ in Scala: -\begin{figure}[H] -\begin{lstlisting} +We implemented the +function $\textit{prune}$ in Scala (see figure \ref{fig:pruneFunc}) +The function $\textit{prune}$ +is a stronger version of $\textit{distinctBy}$. +It does not just walk through a list looking for exact duplicates, +but prunes sub-expressions recursively. +It manages proper contexts by the helper functions +$\textit{removeSeqTail}$, $\textit{isOne}$ and $\textit{atMostEmpty}$. +\begin{figure}%[H] +\begin{lstlisting}[numbers=left] def prune(r: ARexp, acc: Set[Rexp]) : ARexp = r match{ case AALTS(bs, rs) => rs.map(r => prune(r, acc)).filter(_ != AZERO) match { @@ -239,17 +260,16 @@ case r => if(acc(erase(r))) AZERO else r } \end{lstlisting} -\caption{The function $\textit{prune}$ } +\caption{The function $\textit{prune}$ is called recursively in the +alternative case (line 2) and in the sequence case (line 12). +In the alternative case we keep all the accumulators the same, but +in the sequence case we are making necessary changes to the accumulators +to allow correct de-duplication.}\label{fig:pruneFunc} \end{figure} \noindent -The function $\textit{prune}$ -is a stronger version of $\textit{distinctBy}$. -It does not just walk through a list looking for exact duplicates, -but prunes sub-expressions recursively. -It manages proper contexts by the helper functions -$\textit{removeSeqTail}$, $\textit{isOne}$ and $\textit{atMostEmpty}$. -\begin{figure}[H] -\begin{lstlisting} + +\begin{figure} +\begin{lstlisting}[numbers=left] def atMostEmpty(r: Rexp) : Boolean = r match { case ZERO => true case ONE => true @@ -274,17 +294,14 @@ else { r match { case SEQ(r1, r2) => - if(r2 == tail) - r1 - else - ZERO + if(r2 == tail) r1 else ZERO case r => ZERO } } \end{lstlisting} -\caption{The helper functions of $\textit{prune}$} +\caption{The helper functions of $\textit{prune}$ XXX.} \end{figure} \noindent Suppose we feed @@ -295,7 +312,7 @@ \begin{center} $acc = \{a\cdot(d\cdot e),f\cdot (g \cdot (a \cdot (d \cdot e))) \}$ \end{center} -as the input for $\textit{prune}$. +as the input into $\textit{prune}$. The end result will be \[ b\cdot(g\cdot(a\cdot(d\cdot e))) @@ -322,17 +339,17 @@ \textit{removeSeqTail} \quad \;\; f\cdot(g\cdot (a \cdot(d\cdot e)))\quad \;\; a \cdot(d\cdot e). \] + The idea behind $\textit{removeSeqTail}$ is that when pruning recursively, we need to ``zoom in'' to sub-expressions, and this ``zoom in'' needs to be performed on the -accumulators as well, otherwise we will be comparing -apples with oranges. +accumulators as well, otherwise the deletion will not work. The sub-call $\textit{prune} \;\; (\ONE+(f+b)\cdot g)\;\; \{\ONE, f\cdot g \}$ is simpler, which will trigger the alternative clause, causing a pruning on each element in $(\ONE+(f+b)\cdot g)$, -leaving us $b\cdot g$ only. +leaving us with $b\cdot g$ only. Our new lexer with stronger simplification uses $\textit{prune}$ by making it @@ -342,7 +359,7 @@ parts of a regular expression are pruned away. \begin{figure}[H] -\begin{lstlisting} +\begin{lstlisting}[numbers=left] def turnIntoTerms(r: Rexp): List[Rexp] = r match { case SEQ(r1, r2) => turnIntoTerms(r1).flatMap(r11 => furtherSEQ(r11, r2)) @@ -370,7 +387,7 @@ } \end{lstlisting} -\caption{A Stronger Version of $\textit{distinctBy}$} +\caption{A Stronger Version of $\textit{distinctBy}$ XXX} \end{figure} \noindent Once a regular expression has been pruned, @@ -506,11 +523,16 @@ \end{figure} \noindent We call this lexer $\blexerStrong$. -This version is able to drastically reduce the -internal data structure size which -otherwise could -trigger exponential behaviours in +This version is able to reduce the +size of the derivatives which +otherwise +triggered exponential behaviour in $\blexersimp$. +Consider again the runtime for matching +$\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ with strings +of the form $\protect\underbrace{aa..a}_{n}$. +They produce the folloiwng graphs ($\blexerStrong$ on the left-hand-side and +$\blexersimp$ on the right-hand-side). \begin{figure}[H] \centering \begin{tabular}{@{}c@{\hspace{0mm}}c@{\hspace{0mm}}c@{}} @@ -541,13 +563,14 @@ \end{tikzpicture}\\ \multicolumn{2}{l}{} \end{tabular} -\caption{Runtime for matching - $\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ with strings - of the form $\protect\underbrace{aa..a}_{n}$.}\label{fig:aaaaaStarStar} +\caption{}\label{fig:aaaaaStarStar} \end{figure} \noindent -We would like to preserve the correctness like the one -we had for $\blexersimp$: +We hope the correctness is preserved. +%We would like to preserve the correctness like the one +%we had for $\blexersimp$: +The proof idea is to preserve the key lemma in chapter \ref{Bitcoded2} +such as in equation \eqref{eqn:cubicRule}. \begin{conjecture}\label{cubicConjecture} $\blexerStrong \;r \; s = \blexer\; r\;s$ \end{conjecture} @@ -556,7 +579,7 @@ chapter \ref{Bitcoded2} like $r \stackrel{*}{\rightsquigarrow} \textit{bsimp} \; r$ with the new rewriting rule -shown in figure \ref{fig:cubicRule} . +shown in figure \eqref{eqn:cubicRule} . In the next sub-section, we will describe why we @@ -571,18 +594,18 @@ \subsection{Antimirov's partial derivatives} Partial derivatives were first introduced by Antimirov \cite{Antimirov95}. -Partial derivatives are very similar +They are very similar to Brzozowski derivatives, -but splits children of alternative regular expressions into +but split children of alternative regular expressions into multiple independent terms. This means the output of -partial derivatives become a -set of regular expressions: +partial derivatives is a +set of regular expressions, defined as follows \begin{center} - \begin{tabular}{lcl} + \begin{tabular}{lcl@{\hspace{-5mm}}l} $\partial_x \; (r_1 \cdot r_2)$ & $\dn$ & $(\partial_x \; r_1) \cdot r_2 \cup - \partial_x \; r_2 \; \textit{if} \; \; \nullable\; r_1$\\ - & & $(\partial_x \; r_1)\cdot r_2 \quad\quad + \partial_x \; r_2 \;$ & $ \textit{if} \; \; \nullable\; r_1$\\ + & & $(\partial_x \; r_1)\cdot r_2 \quad\quad$ & $ \textit{otherwise}$\\ $\partial_x \; r^*$ & $\dn$ & $(\partial_x \; r) \cdot r^*$\\ $\partial_x \; c $ & $\dn$ & $\textit{if} \; x = c \; @@ -594,7 +617,7 @@ \end{tabular} \end{center} \noindent -The $\cdot$ between for example +The $\cdot$ in the example $(\partial_x \; r_1) \cdot r_2 $ is a shorthand notation for the cartesian product $(\partial_x \; r_1) \times \{ r_2\}$. @@ -604,9 +627,10 @@ Rather than joining the calculated derivatives $\partial_x r_1$ and $\partial_x r_2$ together using the $\sum$ constructor, Antimirov put them into a set. This means many subterms will be de-duplicated -because they are sets, -For example, to compute what regular expression $x^*(xx + y)^*$'s -derivative w.r.t. $x$ is, one can compute a partial derivative +because they are sets. +For example, to compute what the derivative of the regular expression +$x^*(xx + y)^*$ +w.r.t. $x$ is, one can compute a partial derivative and get two singleton sets $\{x^* \cdot (xx + y)^*\}$ and $\{x \cdot (xx + y) ^* \}$ from $\partial_x(x^*) \cdot (xx + y) ^*$ and $\partial_x((xx + y)^*)$. @@ -615,47 +639,51 @@ \partial_{c::cs} r \dn \bigcup_{r'\in (\partial_c r)} \partial_{cs} r' \] -Given an alphabet $\Sigma$, we denote the set of all possible strings -from this alphabet as $\Sigma^*$. -The set of all possible partial derivatives is defined -as the union of derivatives w.r.t all the strings in the universe: +Suppose an alphabet $\Sigma$, we use $\Sigma^*$ for the set of all possible strings +from the alphabet. +The set of all possible partial derivatives is then defined +as the union of derivatives w.r.t all the strings: \begin{center} \begin{tabular}{lcl} $\textit{PDER}_{\Sigma^*} \; r $ & $\dn $ & $\bigcup_{w \in \Sigma^*}\partial_w \; r$ \end{tabular} \end{center} \noindent -Consider now again our pathological case where the derivatives -grow with a rather aggressive simplification +Consider now again our pathological case where we apply the more +aggressive +simplification \begin{center} $((a^* + (aa)^* + \ldots + (\underbrace{a\ldots a}_{n a's})^* )^*)^*$ \end{center} -example, if we denote this regular expression as $r$, -we have that +let use abbreviate theis regular expression with $r$, +then we have that \begin{center} $\textit{PDER}_{\Sigma^*} \; r = \bigcup_{i=1}^{n}\bigcup_{j=0}^{i-1} \{ (\underbrace{a \ldots a}_{\text{j a's}}\cdot (\underbrace{a \ldots a}_{\text{i a's}})^*)\cdot r \}$, \end{center} -with exactly $n * (n + 1) / 2$ terms. -This is in line with our speculation that only $n*(n+1)/2$ terms are -needed. We conjecture that $\bsimpStrong$ is also able to achieve this +The union on the right-hand-side has $n * (n + 1) / 2$ terms. +This leads us to believe that the maximum number of terms needed +in our derivative is also only $n * (n + 1) / 2$. +Therefore +we conjecture that $\bsimpStrong$ is also able to achieve this upper limit in general \begin{conjecture}\label{bsimpStrongInclusionPder} - Using a suitable transformation $f$, we have + Using a suitable transformation $f$, we have that \begin{center} $\forall s.\; f \; (r \bdersStrong \; s) \subseteq \textit{PDER}_{\Sigma^*} \; r$ \end{center} + holds. \end{conjecture} \noindent -because our \ref{fig:cubicRule} will keep only one copy of each term, +The reason is that our \eqref{eqn:cubicRule} will keep only one copy of each term, where the function $\textit{prune}$ takes care of maintaining a set like structure similar to partial derivatives. -We might need to adjust $\textit{prune}$ -slightly to make sure all duplicate terms are eliminated, -which should be doable. +%We might need to adjust $\textit{prune}$ +%slightly to make sure all duplicate terms are eliminated, +%which should be doable. Antimirov had proven that the sum of all the partial derivative terms' sizes is bounded by the cubic of the size of that regular @@ -663,10 +691,11 @@ \begin{property}\label{pderBound} $\llbracket \textit{PDER}_{\Sigma^*} \; r \rrbracket \leq O(\llbracket r \rrbracket^3)$ \end{property} +\noindent This property was formalised by Wu et al. \cite{Wu2014}, and the -details can be found in the archive of formal proofs. \footnote{https://www.isa-afp.org/entries/Myhill-Nerode.html} +details can be found in the Archive of Formal Froofs\footnote{https://www.isa-afp.org/entries/Myhill-Nerode.html}. Once conjecture \ref{bsimpStrongInclusionPder} is proven, then property \ref{pderBound} -would yield us also a cubic bound for our $\blexerStrong$ algorithm: +would provide us with a cubic bound for our $\blexerStrong$ algorithm: \begin{conjecture}\label{strongCubic} $\llbracket r \bdersStrong\; s \rrbracket \leq \llbracket r \rrbracket^3$ \end{conjecture}