diff -r 988e92a70704 -r b2d0de6aee18 ChengsongTanPhdThesis/Chapters/Finite.tex --- a/ChengsongTanPhdThesis/Chapters/Finite.tex Wed Aug 31 23:57:42 2022 +0100 +++ b/ChengsongTanPhdThesis/Chapters/Finite.tex Thu Sep 01 23:47:37 2022 +0100 @@ -2007,108 +2007,10 @@ \multicolumn{3}{c}{Graphs: size change of 3 randomly generated regular expressions $w.r.t.$ input string length.} \end{tabular} \end{center} - - - - - \noindent Most of the regex's sizes seem to stay within a polynomial bound $w.r.t$ the original size. -This suggests a link towrads "partial derivatives" - introduced by Antimirov \cite{Antimirov95}. - - \section{Antimirov's partial derivatives} - The idea behind Antimirov's partial derivatives -is to do derivatives in a similar way as suggested by Brzozowski, -but maintain a set of regular expressions instead of a single one: - -%TODO: antimirov proposition 3.1, needs completion - \begin{center} - \begin{tabular}{ccc} - $\partial_x(a+b)$ & $=$ & $\partial_x(a) \cup \partial_x(b)$\\ -$\partial_x(\ONE)$ & $=$ & $\phi$ -\end{tabular} -\end{center} - -Rather than joining the calculated derivatives $\partial_x a$ and $\partial_x b$ together -using the alternatives constructor, Antimirov cleverly chose to put them into -a set instead. This breaks the terms in a derivative regular expression up, -allowing us to understand what are the "atomic" components of it. -For example, To compute what regular expression $x^*(xx + y)^*$'s -derivative against $x$ is made of, one can do a partial derivative -of it and get two singleton sets $\{x^* \cdot (xx + y)^*\}$ and $\{x \cdot (xx + y) ^* \}$ -from $\partial_x(x^*) \cdot (xx + y) ^*$ and $\partial_x((xx + y)^*)$. -To get all the "atomic" components of a regular expression's possible derivatives, -there is a procedure Antimirov called $\textit{lf}$, short for "linear forms", that takes -whatever character is available at the head of the string inside the language of a -regular expression, and gives back the character and the derivative regular expression -as a pair (which he called "monomial"): - \begin{center} - \begin{tabular}{ccc} - $\lf(\ONE)$ & $=$ & $\phi$\\ -$\lf(c)$ & $=$ & $\{(c, \ONE) \}$\\ - $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\ - $\lf(r^*)$ & $=$ & $\lf(r) \bigodot \lf(r^*)$\\ -\end{tabular} -\end{center} -%TODO: completion - -There is a slight difference in the last three clauses compared -with $\partial$: instead of a dot operator $ \textit{rset} \cdot r$ that attaches the regular -expression $r$ with every element inside $\textit{rset}$ to create a set of -sequence derivatives, it uses the "circle dot" operator $\bigodot$ which operates -on a set of monomials (which Antimirov called "linear form") and a regular -expression, and returns a linear form: - \begin{center} - \begin{tabular}{ccc} - $l \bigodot (\ZERO)$ & $=$ & $\phi$\\ - $l \bigodot (\ONE)$ & $=$ & $l$\\ - $\phi \bigodot t$ & $=$ & $\phi$\\ - $\{ (x, \ZERO) \} \bigodot t$ & $=$ & $\{(x,\ZERO) \}$\\ - $\{ (x, \ONE) \} \bigodot t$ & $=$ & $\{(x,t) \}$\\ - $\{ (x, p) \} \bigodot t$ & $=$ & $\{(x,p\cdot t) \}$\\ - $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\ - $\lf(r^*)$ & $=$ & $\lf(r) \cdot \lf(r^*)$\\ -\end{tabular} -\end{center} -%TODO: completion - - Some degree of simplification is applied when doing $\bigodot$, for example, - $l \bigodot (\ZERO) = \phi$ corresponds to $r \cdot \ZERO \rightsquigarrow \ZERO$, - and $l \bigodot (\ONE) = l$ to $l \cdot \ONE \rightsquigarrow l$, and - $\{ (x, \ZERO) \} \bigodot t = \{(x,\ZERO) \}$ to $\ZERO \cdot x \rightsquigarrow \ZERO$, - and so on. - - With the function $\lf$ one can compute all possible partial derivatives $\partial_{UNIV}(r)$ of a regex $r$ with - an iterative procedure: - \begin{center} - \begin{tabular}{llll} -$\textit{while}$ & $(\Delta_i \neq \phi)$ & & \\ - & $\Delta_{i+1}$ & $ =$ & $\lf(\Delta_i) - \PD_i$ \\ - & $\PD_{i+1}$ & $ =$ & $\Delta_{i+1} \cup \PD_i$ \\ -$\partial_{UNIV}(r)$ & $=$ & $\PD$ & -\end{tabular} -\end{center} - - - $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$, - - -However, if we introduce them in our -setting we would lose the POSIX property of our calculated values. -A simple example for this would be the regex $(a + a\cdot b)\cdot(b\cdot c + c)$. -If we split this regex up into $a\cdot(b\cdot c + c) + a\cdot b \cdot (b\cdot c + c)$, the lexer -would give back $\Left(\Seq(\Char(a), \Left(\Char(b \cdot c))))$ instead of -what we want: $\Seq(\Right(ab), \Right(c))$. Unless we can store the structural information -in all the places where a transformation of the form $(r_1 + r_2)\cdot r \rightarrow r_1 \cdot r + r_2 \cdot r$ -occurs, and apply them in the right order once we get -a result of the "aggressively simplified" regex, it would be impossible to still get a $\POSIX$ value. -This is unlike the simplification we had before, where the rewriting rules -such as $\ONE \cdot r \rightsquigarrow r$, under which our lexer will give the same value. -We will discuss better -bounds in the last section of this chapter.\\[-6.5mm] - +We will discuss improvements to this bound in the next chapter.