ChengsongTanPhdThesis/Chapters/ChapterFinite.tex
changeset 528 28751de4b4ba
parent 527 2c907b118f78
child 530 823d9b19d21c
--- a/ChengsongTanPhdThesis/Chapters/ChapterFinite.tex	Sat May 28 17:17:18 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/ChapterFinite.tex	Mon May 30 14:41:09 2022 +0100
@@ -247,15 +247,22 @@
 rules very effectively.
 
 
-As the graphs of  some more randomly generated regexes show, the size of 
-the derivative might grow quickly at the start of the input,
- but after a sufficiently long  input string, the trend will stop.
+In our proof for the inductive case $r_1 \cdot r_2$, the dominant term in the bound
+is $l_{N_2} * N_2$, where $N_2$ is the bound we have for $\llbracket \bderssimp{r_2}{s} \rrbracket$.
+Given that $l_{N_2}$ is roughly the size $4^{N_2}$, the size bound $\llbracket \bderssimp{r_1 \cdot r_2}{s} \rrbracket$
+inflates the size bound of $\llbracket \bderssimp{r_2}{s} \rrbracket$ with the function
+$f(x) = x * 2^x$.
+This means the bound we have will surge up at least
+tower-exponentially with a linear increase of the depth.
+For a regex of depth $n$, the bound
+would be approximately $4^n$.
 
-
+Test data in the graphs from randomly generated regular expressions
+shows that the giant bounds are far from being hit.
 %a few sample regular experessions' derivatives
 %size change
-%TODO: giving graphs showing a few regexes' size changes 
-%w;r;t the input characters number
+%TODO: giving regex1_size_change.data showing a few regexes' size changes 
+%w;r;t the input characters number, where the size is usually cubic in terms of original size
 %a*, aa*, aaa*, .....
 %randomly generated regexes
 \begin{center}
@@ -331,29 +338,87 @@
 
 
 \noindent
-Clearly we give in this finiteness argument (Step (5)) a very loose bound that is
-far from the actual bound we can expect. 
-In our proof for the inductive case $r_1 \cdot r_2$, the dominant term in the bound
-is $l_{N_2} * N_2$, where $N_2$ is the bound we have for $\llbracket \bderssimp{r_2}{s} \rrbracket$.
-Given that $l_{N_2}$ is roughly the size $4^{N_2}$, the size bound $\llbracket \bderssimp{r_1 \cdot r_2}{s} \rrbracket$
-inflates the size bound of $\llbracket \bderssimp{r_2}{s} \rrbracket$ with the function
-$f(x) = x * 2^x$.
-This means the bound we have will surge up at least
-tower-exponentially with a linear increase of the depth.
-For a regex of depth $n$, the bound
-would be approximately $4^n$.
-%TODO: change this to tower exponential!
+Most of the regex's sizes seem to stay within a polynomial bound $w.r.t$ the 
+original size.
+This suggests a link towrads "partial derivatives"
+ introduced by Antimirov \cite{Antimirov95}.
+ 
+ \section{Antimirov's partial derivatives}
+ The idea behind Antimirov's partial derivatives
+is to do derivatives in a similar way as suggested by Brzozowski, 
+but maintain a set of regular expressions instead of a single one:
+
+%TODO: antimirov proposition 3.1, needs completion
+ \begin{center}  
+ \begin{tabular}{ccc}
+ $\partial_x(a+b)$ & $=$ & $\partial_x(a) \cup \partial_x(b)$\\
+$\partial_x(\ONE)$ & $=$ & $\phi$
+\end{tabular}
+\end{center}
+
+Rather than joining the calculated derivatives $\partial_x a$ and $\partial_x b$ together
+using the alternatives constructor, Antimirov cleverly chose to put them into
+a set instead.  This breaks the terms in a derivative regular expression up, 
+allowing us to understand what are the "atomic" components of it.
+For example, To compute what regular expression $x^*(xx + y)^*$'s 
+derivative against $x$ is made of, one can do a partial derivative
+of it and get two singleton sets $\{x^* \cdot (xx + y)^*\}$ and $\{x \cdot (xx + y) ^* \}$
+from $\partial_x(x^*) \cdot (xx + y) ^*$ and $\partial_x((xx + y)^*)$.
+To get all the "atomic" components of a regular expression's possible derivatives,
+there is a procedure Antimirov called $\textit{lf}$, short for "linear forms", that takes
+whatever character is available at the head of the string inside the language of a
+regular expression, and gives back the character and the derivative regular expression
+as a pair (which he called "monomial"):
+ \begin{center}  
+ \begin{tabular}{ccc}
+ $\lf(\ONE)$ & $=$ & $\phi$\\
+$\lf(c)$ & $=$ & $\{(c, \ONE) \}$\\
+ $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
+ $\lf(r^*)$ & $=$ & $\lf(r) \bigodot \lf(r^*)$\\
+\end{tabular}
+\end{center}
+%TODO: completion
 
-We can do better than this, but this does not improve
-the finiteness property we are proving. If we are interested in a polynomial bound,
-one would hope to obtain a similar tight bound as for partial
-derivatives introduced by Antimirov \cite{Antimirov95}. After all the idea with
- $\distinctWith$ is to maintain a ``set'' of alternatives (like the sets in
-partial derivatives). 
-To obtain the exact same bound would mean
-we need to introduce simplifications, such as
+There is a slight difference in the last three clauses compared
+with $\partial$: instead of a dot operator $ \textit{rset} \cdot r$ that attaches the regular 
+expression $r$ with every element inside $\textit{rset}$ to create a set of 
+sequence derivatives, it uses the "circle dot" operator $\bigodot$ which operates 
+on a set of monomials (which Antimirov called "linear form") and a regular 
+expression, and returns a linear form:
+ \begin{center}  
+ \begin{tabular}{ccc}
+ $l \bigodot (\ZERO)$ & $=$ & $\phi$\\
+ $l \bigodot (\ONE)$ & $=$ & $l$\\
+ $\phi \bigodot t$ & $=$ & $\phi$\\
+ $\{ (x, \ZERO) \} \bigodot t$ & $=$ & $\{(x,\ZERO) \}$\\
+ $\{ (x, \ONE) \} \bigodot t$ & $=$ & $\{(x,t) \}$\\
+  $\{ (x, p) \} \bigodot t$ & $=$ & $\{(x,p\cdot t) \}$\\
+ $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
+ $\lf(r^*)$ & $=$ & $\lf(r) \cdot \lf(r^*)$\\
+\end{tabular}
+\end{center}
+%TODO: completion
+
+ Some degree of simplification is applied when doing $\bigodot$, for example,
+ $l \bigodot (\ZERO) = \phi$ corresponds to $r \cdot \ZERO \rightsquigarrow \ZERO$,
+ and  $l \bigodot (\ONE) = l$ to $l \cdot \ONE \rightsquigarrow l$, and
+  $\{ (x, \ZERO) \} \bigodot t = \{(x,\ZERO) \}$ to $\ZERO \cdot x \rightsquigarrow \ZERO$,
+  and so on.
+  
+  With the function $\lf$ one can compute all possible partial derivatives $\partial_{UNIV}(r)$ of a regex $r$ with 
+  an iterative procedure:
+   \begin{center}  
+ \begin{tabular}{llll}
+$\textit{while}$ & $(\Delta_i \neq \phi)$  &                &          \\
+ 		       &  $\Delta_{i+1}$           &        $ =$ & $\lf(\Delta_i) - \PD_i$ \\
+		       &  $\PD_{i+1}$              &         $ =$ & $\Delta_{i+1} \cup \PD_i$ \\
+$\partial_{UNIV}(r)$ & $=$ & $\PD$ &		     
+\end{tabular}
+\end{center}
+
+
  $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$,
-which exist for partial derivatives. 
+
 
 However, if we introduce them in our
 setting we would lose the POSIX property of our calculated values.