diff -r 2c907b118f78 -r 28751de4b4ba ChengsongTanPhdThesis/Chapters/ChapterFinite.tex --- a/ChengsongTanPhdThesis/Chapters/ChapterFinite.tex Sat May 28 17:17:18 2022 +0100 +++ b/ChengsongTanPhdThesis/Chapters/ChapterFinite.tex Mon May 30 14:41:09 2022 +0100 @@ -247,15 +247,22 @@ rules very effectively. -As the graphs of some more randomly generated regexes show, the size of -the derivative might grow quickly at the start of the input, - but after a sufficiently long input string, the trend will stop. +In our proof for the inductive case $r_1 \cdot r_2$, the dominant term in the bound +is $l_{N_2} * N_2$, where $N_2$ is the bound we have for $\llbracket \bderssimp{r_2}{s} \rrbracket$. +Given that $l_{N_2}$ is roughly the size $4^{N_2}$, the size bound $\llbracket \bderssimp{r_1 \cdot r_2}{s} \rrbracket$ +inflates the size bound of $\llbracket \bderssimp{r_2}{s} \rrbracket$ with the function +$f(x) = x * 2^x$. +This means the bound we have will surge up at least +tower-exponentially with a linear increase of the depth. +For a regex of depth $n$, the bound +would be approximately $4^n$. - +Test data in the graphs from randomly generated regular expressions +shows that the giant bounds are far from being hit. %a few sample regular experessions' derivatives %size change -%TODO: giving graphs showing a few regexes' size changes -%w;r;t the input characters number +%TODO: giving regex1_size_change.data showing a few regexes' size changes +%w;r;t the input characters number, where the size is usually cubic in terms of original size %a*, aa*, aaa*, ..... %randomly generated regexes \begin{center} @@ -331,29 +338,87 @@ \noindent -Clearly we give in this finiteness argument (Step (5)) a very loose bound that is -far from the actual bound we can expect. -In our proof for the inductive case $r_1 \cdot r_2$, the dominant term in the bound -is $l_{N_2} * N_2$, where $N_2$ is the bound we have for $\llbracket \bderssimp{r_2}{s} \rrbracket$. -Given that $l_{N_2}$ is roughly the size $4^{N_2}$, the size bound $\llbracket \bderssimp{r_1 \cdot r_2}{s} \rrbracket$ -inflates the size bound of $\llbracket \bderssimp{r_2}{s} \rrbracket$ with the function -$f(x) = x * 2^x$. -This means the bound we have will surge up at least -tower-exponentially with a linear increase of the depth. -For a regex of depth $n$, the bound -would be approximately $4^n$. -%TODO: change this to tower exponential! +Most of the regex's sizes seem to stay within a polynomial bound $w.r.t$ the +original size. +This suggests a link towrads "partial derivatives" + introduced by Antimirov \cite{Antimirov95}. + + \section{Antimirov's partial derivatives} + The idea behind Antimirov's partial derivatives +is to do derivatives in a similar way as suggested by Brzozowski, +but maintain a set of regular expressions instead of a single one: + +%TODO: antimirov proposition 3.1, needs completion + \begin{center} + \begin{tabular}{ccc} + $\partial_x(a+b)$ & $=$ & $\partial_x(a) \cup \partial_x(b)$\\ +$\partial_x(\ONE)$ & $=$ & $\phi$ +\end{tabular} +\end{center} + +Rather than joining the calculated derivatives $\partial_x a$ and $\partial_x b$ together +using the alternatives constructor, Antimirov cleverly chose to put them into +a set instead. This breaks the terms in a derivative regular expression up, +allowing us to understand what are the "atomic" components of it. +For example, To compute what regular expression $x^*(xx + y)^*$'s +derivative against $x$ is made of, one can do a partial derivative +of it and get two singleton sets $\{x^* \cdot (xx + y)^*\}$ and $\{x \cdot (xx + y) ^* \}$ +from $\partial_x(x^*) \cdot (xx + y) ^*$ and $\partial_x((xx + y)^*)$. +To get all the "atomic" components of a regular expression's possible derivatives, +there is a procedure Antimirov called $\textit{lf}$, short for "linear forms", that takes +whatever character is available at the head of the string inside the language of a +regular expression, and gives back the character and the derivative regular expression +as a pair (which he called "monomial"): + \begin{center} + \begin{tabular}{ccc} + $\lf(\ONE)$ & $=$ & $\phi$\\ +$\lf(c)$ & $=$ & $\{(c, \ONE) \}$\\ + $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\ + $\lf(r^*)$ & $=$ & $\lf(r) \bigodot \lf(r^*)$\\ +\end{tabular} +\end{center} +%TODO: completion -We can do better than this, but this does not improve -the finiteness property we are proving. If we are interested in a polynomial bound, -one would hope to obtain a similar tight bound as for partial -derivatives introduced by Antimirov \cite{Antimirov95}. After all the idea with - $\distinctWith$ is to maintain a ``set'' of alternatives (like the sets in -partial derivatives). -To obtain the exact same bound would mean -we need to introduce simplifications, such as +There is a slight difference in the last three clauses compared +with $\partial$: instead of a dot operator $ \textit{rset} \cdot r$ that attaches the regular +expression $r$ with every element inside $\textit{rset}$ to create a set of +sequence derivatives, it uses the "circle dot" operator $\bigodot$ which operates +on a set of monomials (which Antimirov called "linear form") and a regular +expression, and returns a linear form: + \begin{center} + \begin{tabular}{ccc} + $l \bigodot (\ZERO)$ & $=$ & $\phi$\\ + $l \bigodot (\ONE)$ & $=$ & $l$\\ + $\phi \bigodot t$ & $=$ & $\phi$\\ + $\{ (x, \ZERO) \} \bigodot t$ & $=$ & $\{(x,\ZERO) \}$\\ + $\{ (x, \ONE) \} \bigodot t$ & $=$ & $\{(x,t) \}$\\ + $\{ (x, p) \} \bigodot t$ & $=$ & $\{(x,p\cdot t) \}$\\ + $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\ + $\lf(r^*)$ & $=$ & $\lf(r) \cdot \lf(r^*)$\\ +\end{tabular} +\end{center} +%TODO: completion + + Some degree of simplification is applied when doing $\bigodot$, for example, + $l \bigodot (\ZERO) = \phi$ corresponds to $r \cdot \ZERO \rightsquigarrow \ZERO$, + and $l \bigodot (\ONE) = l$ to $l \cdot \ONE \rightsquigarrow l$, and + $\{ (x, \ZERO) \} \bigodot t = \{(x,\ZERO) \}$ to $\ZERO \cdot x \rightsquigarrow \ZERO$, + and so on. + + With the function $\lf$ one can compute all possible partial derivatives $\partial_{UNIV}(r)$ of a regex $r$ with + an iterative procedure: + \begin{center} + \begin{tabular}{llll} +$\textit{while}$ & $(\Delta_i \neq \phi)$ & & \\ + & $\Delta_{i+1}$ & $ =$ & $\lf(\Delta_i) - \PD_i$ \\ + & $\PD_{i+1}$ & $ =$ & $\Delta_{i+1} \cup \PD_i$ \\ +$\partial_{UNIV}(r)$ & $=$ & $\PD$ & +\end{tabular} +\end{center} + + $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$, -which exist for partial derivatives. + However, if we introduce them in our setting we would lose the POSIX property of our calculated values.