ChengsongTanPhdThesis/Chapters/Chapter2.tex
changeset 505 5ce3bd8e5696
parent 503 35b80e37dfe7
child 506 69ad05398894
--- a/ChengsongTanPhdThesis/Chapters/Chapter2.tex	Fri May 06 13:22:20 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Chapter2.tex	Sat May 07 13:38:44 2022 +0100
@@ -16,8 +16,7 @@
 To have a clear idea of what we can and 
 need to prove about the algorithms involving
 Brzozowski's derivatives, there are a few 
-properties we need to be clear about 
-it.
+properties we need to be clear about .
 \subsection{function $\backslash c$ is not 1-to-1}
 \begin{center}
 The derivative $w.r.t$ character $c$ is not one-to-one.
@@ -53,10 +52,7 @@
 %	SUBSECTION 1
 %-----------------------------------
 \subsection{Subsection 1}
-
-Nunc posuere quam at lectus tristique eu ultrices augue venenatis. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam erat volutpat. Vivamus sodales tortor eget quam adipiscing in vulputate ante ullamcorper. Sed eros ante, lacinia et sollicitudin et, aliquam sit amet augue. In hac habitasse platea dictumst.
-
-
+To be completed.
 
 
 
@@ -98,7 +94,25 @@
 Similarly we could define the derivative  and simplification on 
 $\rrexp$, which would be identical to those we defined for plain $\rexp$s in chapter1, 
 except that now they can operate on alternatives taking multiple arguments.
-%TODO: definition of rder rsimp (maybe only the alternative clause)
+
+\begin{center}
+\begin{tabular}{lcr}
+$\RALTS{rs} \backslash c$ & $=$ &  $\RALTS{map\; (\_ \backslash c) \;rs}$\\
+(other clauses omitted)
+\end{tabular}
+\end{center}
+
+Now that $\rrexp$s do not have bitcodes on them, we can do the 
+duplicate removal without  an equivalence relation:
+\begin{center}
+\begin{tabular}{lcl}
+$\rdistinct{r :: rs}{rset}$ & $=$ & $\textit{if}(r \in \textit{rset}) \; \textit{then} \; \rdistinct{rs}{rset}$\\
+           			    &        & $\textit{else}\; r::\rdistinct{rs}{(rset \cup \{r\})}$
+\end{tabular}
+\end{center}
+%TODO: definition of rsimp (maybe only the alternative clause)
+
+
 The reason why these definitions  mirror precisely the corresponding operations
 on annotated regualar expressions is that we can calculate sizes more easily:
 
@@ -117,6 +131,9 @@
 \end{lemma}
 Unless stated otherwise in this chapter all $\textit{rexp}$s without
  bitcodes are seen as $\rrexp$s.
+ We also use $r_1 + r_2$ and $\RALTS{[r_1, r_2]}$ interchageably
+ as the former suits people's intuitive way of stating a binary alternative
+ regular expression.
 
  Suppose
 we have a size function for bitcoded regular expressions, written
@@ -252,25 +269,24 @@
 With the help of $\sflat{\_}$ we can state the equation in Indianpaper
 more rigorously:
 \begin{lemma}
-$\sflat{(r_1 \cdot r_2) \backslash s} = \AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}$
+$\sflat{(r_1 \cdot r_2) \backslash s} = \RALTS{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}$
 \end{lemma}
-With this property we can prove the finiteness of the size of the regex $(r_1 \cdot r_2) \backslash s$,
-much like a recursive function's termination proof.
-The function $\vsuf{\_}{\_}$ is defined this way:
-%TODO: DEF of vsuf
+
+The function $\vsuf{\_}{\_}$ is defined recursively on the structure of the string:
+
 \begin{center}
 \begin{tabular}{lcl}
 $\vsuf{[]}{\_} $ & $=$ &  $[]$\\
-$\vsuf{c::cs}{r_1}$ & $ =$ & $ \textit{if} (\rnullable{r_1}) \textit{then}  (\vsuf{cs}{(rder c r1)}) @ [c :: cs]$\\
-                                     && $\textit{else}  (\vsuf{cs}{rder c r1})  $
+$\vsuf{c::cs}{r_1}$ & $ =$ & $ \textit{if} (\rnullable{r_1}) \textit{then} \; (\vsuf{cs}{(\rder{c}{r_1})}) @ [c :: cs]$\\
+                                     && $\textit{else} \; (\vsuf{cs}{(\rder{c}{r_1}) })  $
 \end{tabular}
 \end{center}                   
 It takes into account which prefixes $s'$ of $s$ would make $r \backslash s'$ nullable,
 and keep a list of suffixes $s''$ such that $s' @ s'' = s$. The list is sorted in
 the order $r_2\backslash s''$ appears in $(r_1\cdot r_2)\backslash s$.
-In essence, $\vsuf{\_}{\_}$ is doing a "virtual derivative" of $r$, but instead of producing 
+In essence, $\vsuf{\_}{\_}$ is doing a "virtual derivative" of $r_1 \cdot r_2$, but instead of producing 
 the entire result of  $(r_1 \cdot r_2) \backslash s$, 
-it only stores all the $r_2 \backslash s''$ terms.
+it only stores all the $s''$ such that $r_2 \backslash s''$  are occurring terms in $(r_1\cdot r_2)\backslash s$.
 
 With this we can also add simplifications to both sides and get
 \begin{lemma}
@@ -278,16 +294,18 @@
 \end{lemma}
 Together with the idempotency property of $\rsimp$,
 %TODO: state the idempotency property of rsimp
-it is possible to convert the above lemma to obtain a "closed form"
-for our lexer's intermediate result without bitcodes:
 \begin{lemma}
-$\rderssimp{\sflat{(r_1 \cdot r_2) \backslash s} }= \rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}$
+$\rsimp(r) = \rsimp (\rsimp(r))$
 \end{lemma}
-And now the reason we have (2) in section 1 is clear:
-$\rsize{\rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}}$
-is bounded by     $\llbracket\textit{\distinctWith}\,((\ASEQ{nil}{\bderssimp{r_1}{s}}{r_2}$) ::
-    [$\bderssimp{ r_2}{s'} \;|\; s' \in \textit{Suffix}( r_1, s)])\,\approx\;{}\rrbracket + 1 $ ,
-    as $\vsuf{s}{r_1}$ is a subset of s $\textit{Suffix}( r_1, s)])$.
+it is possible to convert the above lemma to obtain a "closed form"
+for  derivatives nested with simplification:
+\begin{lemma}
+$\rderssimp{(r_1 \cdot r_2)}{s} = \rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}$
+\end{lemma}
+And now the reason we have (1) in section 1 is clear:
+$\rsize{\rsimp{\RALTS{ (r_1 \backslash s) \cdot r_2 :: (\map \;(r_2 \backslash \_) \; (\vsuf{s}{r_1}))}}}$, 
+is equal to $\rsize{\rsimp{\RALTS{ ((r_1 \backslash s) \cdot r_2 :: (\map \; (r_2 \backslash \_) \; (\textit{Suffix}(r1, s))))}}}$
+    as $\vsuf{s}{r_1}$ is one way of expressing the list $\textit{Suffix}( r_1, s)$.
 
 %----------------------------------------------------------------------------------------
 %	SECTION 3
@@ -295,6 +313,74 @@
 
 \section{interaction between $\distinctWith$ and $\flts$}
 Note that it is not immediately obvious that 
-$\llbracket \distinctWith (\flts \textit{rs}) = \phi \rrbracket   \leq \llbracket \distinctWith \textit{rs} = \phi \rrbracket  $
+\begin{center}
+$\llbracket \distinctWith (\flts \textit{rs}) = \phi \rrbracket   \leq \llbracket \distinctWith \textit{rs} = \phi \rrbracket  $.
+\end{center}
+The intuition is that if we remove duplicates from the $\textit{LHS}$, at least the same amount of 
+duplicates will be removed from the list $\textit{rs}$ in the $\textit{RHS}$. 
+
+%----------------------------------------------------------------------------------------
+%	SECTION 4
+%----------------------------------------------------------------------------------------
+\section{a bound for the star regular expression}
+We have shown how to control the size of the sequence regular expression $r_1\cdot r_2$ using
+the "closed form" of $(r_1 \cdot r_2) \backslash s$ and then 
+the property of the $\distinct$ function.
+Now we try to get a bound on $r^* \backslash s$ as well.
+Again, we first look at how a star's derivatives evolve, if they grow maximally: 
+\begin{center}
+
+$r^* \quad \longrightarrow_{\backslash c}  \quad   (r\backslash c)  \cdot  r^* \quad \longrightarrow_{\backslash c'}  \quad
+r \backslash cc'  \cdot r^* + r \backslash c' \cdot r^*  \quad \longrightarrow_{\backslash c''} \quad 
+(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*)   \quad \longrightarrow_{\backslash c'''}
+\quad \ldots$
+
+\end{center}
+When we have a string $s = c :: c' :: c'' \ldots$  such that $r \backslash c$, $r \backslash cc'$, $r \backslash c'$, 
+$r \backslash cc'c''$, $r \backslash c'c''$, $r\backslash c''$ etc. are all nullable,
+the number of terms in $r^* \backslash s$ will grow exponentially, causing the size
+of the derivatives $r^* \backslash s$ to grow exponentially, even if we do not 
+count the possible size explosions of $r \backslash c$ themselves.
 
-Sed ullamcorper quam eu nisl interdum at interdum enim egestas. Aliquam placerat justo sed lectus lobortis ut porta nisl porttitor. Vestibulum mi dolor, lacinia molestie gravida at, tempus vitae ligula. Donec eget quam sapien, in viverra eros. Donec pellentesque justo a massa fringilla non vestibulum metus vestibulum. Vestibulum in orci quis felis tempor lacinia. Vivamus ornare ultrices facilisis. Ut hendrerit volutpat vulputate. Morbi condimentum venenatis augue, id porta ipsum vulputate in. Curabitur luctus tempus justo. Vestibulum risus lectus, adipiscing nec condimentum quis, condimentum nec nisl. Aliquam dictum sagittis velit sed iaculis. Morbi tristique augue sit amet nulla pulvinar id facilisis ligula mollis. Nam elit libero, tincidunt ut aliquam at, molestie in quam. Aenean rhoncus vehicula hendrerit.
+Thanks to $\flts$ and $\distinctWith$, we are able to open up regexes like
+$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $ 
+into $\RALTS{[r_1 \backslash cc'c'' \cdot r^*, r \backslash c'', r \backslash c'c'' \cdot r^*, r \backslash c'' \cdot r^*]}$
+and then de-duplicate terms of the form $r\backslash s' \cdot r^*$ ($s'$ being a substring of $s$).
+For this we define $\hflataux{\_}$ and $\hflat{\_}$, similar to $\sflataux{\_}$ and $\sflat{\_}$:
+%TODO: definitions of  and \hflataux \hflat
+ \begin{center}  
+ \begin{tabular}{ccc}
+ $\hflataux{r_1 + r_2}$ & $=$ & $\hflataux{r_1} @ \hflataux{r_2}$\\
+$\hflataux r$ & $=$ & $ [r]$
+\end{tabular}
+\end{center}
+
+ \begin{center}  
+ \begin{tabular}{ccc}
+ $\hflat{r_1 + r_2}$ & $=$ & $\RALTS{\hflataux{r_1} @ \hflataux{r_2}}$\\
+$\hflat r$ & $=$ & $ r$
+\end{tabular}
+\end{center}
+Again these definitions are tailor-made for dealing with alternatives that have
+originated from a star's derivatives, so we don't attempt to open up all possible 
+regexes of the form $\RALTS{rs}$, where $\textit{rs}$ might not contain precisely 2
+elements.
+We give a predicate for such "star-created" regular expressions:
+\begin{center}
+\begin{tabular}{lcr}
+         &    &       $\createdByStar{\RSEQ{ra}{\RSTAR{rb}}}$\\
+ $\createdByStar{r_1} \land \createdByStar{r_2} $ & $ \Longrightarrow$ & $\createdByStar{(r_1 + r_2)}$
+ \end{tabular}
+ \end{center}
+ 
+One might wonder the actual bound rather than the loose bound we gave
+for the convenience of a easier proof.
+How much can the regex $r^* \backslash s$ grow? As hinted earlier,
+ they can grow at a maximum speed
+of exponential $w.r.t$ the number of characters.
+But they will eventually level off when the string $s$ is long enough,
+as suggested by the finiteness bound proof.
+
+%TODO: give out a graph showing how the size of the regex grows and levels off at a size exponential to the original regex
+
+