ChengsongTanPhdThesis/Chapters/Chapter2.tex
changeset 500 4d9eecfc936a
parent 468 a0f27e21b42c
child 503 35b80e37dfe7
equal deleted inserted replaced
498:ab626b60ee64 500:4d9eecfc936a
     1 % Chapter Template
     1 % Chapter Template
     2 
     2 
     3 \chapter{Chapter Title Here} % Main chapter title
     3 \chapter{Chapter Title Here} % Main chapter title
     4 
     4 
     5 \label{ChapterX} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
     5 \label{ChapterX} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
       
     6 
       
     7 
       
     8 
     6 
     9 
     7 %----------------------------------------------------------------------------------------
    10 %----------------------------------------------------------------------------------------
     8 %	SECTION 1
    11 %	SECTION 1
     9 %----------------------------------------------------------------------------------------
    12 %----------------------------------------------------------------------------------------
    10 
    13 
    51 %-----------------------------------
    54 %-----------------------------------
    52 \subsection{Subsection 1}
    55 \subsection{Subsection 1}
    53 
    56 
    54 Nunc posuere quam at lectus tristique eu ultrices augue venenatis. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam erat volutpat. Vivamus sodales tortor eget quam adipiscing in vulputate ante ullamcorper. Sed eros ante, lacinia et sollicitudin et, aliquam sit amet augue. In hac habitasse platea dictumst.
    57 Nunc posuere quam at lectus tristique eu ultrices augue venenatis. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam erat volutpat. Vivamus sodales tortor eget quam adipiscing in vulputate ante ullamcorper. Sed eros ante, lacinia et sollicitudin et, aliquam sit amet augue. In hac habitasse platea dictumst.
    55 
    58 
       
    59 
       
    60 
       
    61 
       
    62 
       
    63 
       
    64 
       
    65 
       
    66 
    56 %-----------------------------------
    67 %-----------------------------------
    57 %	SUBSECTION 2
    68 %	SECTION 2
    58 %-----------------------------------
    69 %-----------------------------------
    59 
    70 
    60 \subsection{Subsection 2}
    71 \section{Finiteness Property}
    61 Morbi rutrum odio eget arcu adipiscing sodales. Aenean et purus a est pulvinar pellentesque. Cras in elit neque, quis varius elit. Phasellus fringilla, nibh eu tempus venenatis, dolor elit posuere quam, quis adipiscing urna leo nec orci. Sed nec nulla auctor odio aliquet consequat. Ut nec nulla in ante ullamcorper aliquam at sed dolor. Phasellus fermentum magna in augue gravida cursus. Cras sed pretium lorem. Pellentesque eget ornare odio. Proin accumsan, massa viverra cursus pharetra, ipsum nisi lobortis velit, a malesuada dolor lorem eu neque.
    72 In this section let us sketch our argument for why the size of the simplified
       
    73 derivatives with the aggressive simplification function can be finitely bounded.
       
    74 For convenience, we use a new datatype which we call $\textit{rrexp}$ to denote
       
    75 the difference between it and annotated regular expressions. 
       
    76 For $\rrexp$ we throw away the bitcodes on the annotated regular expressions, but keeps
       
    77 everything else intact.
       
    78 It is similar to annotated regular expression being $\erase$ed, but with all its structure being intact
       
    79 (the $\erase$ function unfortunately does not preserve structure in the case of empty and singleton
       
    80 $\AALTS$.
       
    81 We denote the operation of erasing the bits and turning an annotated regular expression 
       
    82 into an $\rrexp$ as $\rerase$.
       
    83 %TODO: definition of rerase
       
    84 That we can bound the size of annotated regular expressions by 
       
    85 $\rrexp$ is that the bitcodes grow linearly w.r.t the input string, and don't contribute to the structural size of 
       
    86 an annotated regular expression:
       
    87 $\rsize (\rerase a) = \asize a$
       
    88 
       
    89  Suppose
       
    90 we have a size function for bitcoded regular expressions, written
       
    91 $\llbracket r\rrbracket$, which counts the number of nodes if we regard $r$ as a tree
       
    92 (we omit the precise definition; ditto for lists $\llbracket r\!s\rrbracket$). For this we show that for every $r$
       
    93 there exists a bound $N$
       
    94 such that 
       
    95 
       
    96 \begin{center}
       
    97 $\forall s. \; \llbracket{\bderssimp r s}\rrbracket \leq N$
       
    98 \end{center}
       
    99 
       
   100 \noindent
       
   101 We prove this by induction on $r$. The base cases for $\AZERO$,
       
   102 $\AONE \textit{bs}$ and $\ACHAR \textit{bs} c$ are straightforward. The interesting case is
       
   103 for sequences of the form $\ASEQ \textit{bs} r_1 r_2$. In this case our induction
       
   104 hypotheses state $\exists N_1. \forall s. \; \llbracket \bderssimp(r_1, s) \rrbracket \leq N_1$ and
       
   105 $\exists N_2. \forall s. \; \llbracket \bderssimp(r_2, s) \rrbracket \leq N_2$. We can reason as follows
       
   106 %
       
   107 \begin{center}
       
   108 \begin{tabular}{lcll}
       
   109 & & $ \llbracket \bderssimp( (\ASEQ\, \textit{bs} \, r_1 \,r_2),  s) \rrbracket$\\
       
   110 & $ = $ & $\llbracket bsimp\,(\textit{ALTs}\;bs\;((ASEQ [] (\bderssimp r_1 s) r_2) ::
       
   111     [\bderssimp r_2 s' \;|\; s' \in \textit{Suffix}( r_1, s)]))\rrbracket $ & (1) \\
       
   112 & $\leq$ &
       
   113     $\llbracket\textit{\distinctWith}\,((\ASEQ [] (\bderssimp r_1 s) r_2$) ::
       
   114     [$\bderssimp r_2 s' \;|\; s' \in \textit{Suffix}( r_1, s)])\,\approx\;{}\rrbracket + 1 $ & (2) \\
       
   115 & $\leq$ & $\llbracket\ASEQ [] (\bderssimp r_1 s) r_2\rrbracket +
       
   116              \llbracket\textit{distinctWith}\,[\bderssimp r_2 s' \;|\; s' \in \textit{Suffix}( r_1, s)]\,\approx\;{}\rrbracket + 1 $ & (3) \\
       
   117 & $\leq$ & $N_1 + \llbracket r_2\rrbracket + 2 +
       
   118       \llbracket \distinctWith\,[ \bderssimp r_2 s' \;|\; s' \in \textit{Suffix}( r_1, s)] \,\approx\;\rrbracket$ & (4)\\
       
   119 & $\leq$ & $N_1 + \llbracket r_2\rrbracket + 2 + l_{N_{2}} * N_{2}$ & (5)
       
   120 \end{tabular}
       
   121 \end{center}
       
   122 
       
   123 % tell Chengsong about Indian paper of closed forms of derivatives
       
   124 
       
   125 \noindent
       
   126 where in (1) the $\textit{Suffix}( r_1, s)$ are the all the suffixes of $s$ where $\bderssimp r_1 s'$ is nullable ($s'$ being a suffix of $s$).
       
   127 The reason why we could write the derivatives of a sequence this way is described in section 2.
       
   128 The term (2) is used to control (1) since it we know that you can obtain an overall
       
   129 smaller regex list
       
   130 by flattening it and removing $\ZERO$s first before applying $\distinctWith$ on it.
       
   131 Section 3 is dedicated to its proof.
       
   132 In (3) we know that  $\llbracket\ASEQ [] (\bderssimp r_1 s) r_2\rrbracket$ is 
       
   133 bounded by $N_1 + \llbracket{}r_2\rrbracket + 1$. In (5) we know the list comprehension contains only regular expressions of size smaller
       
   134 than $N_2$. The list length after  $\distinctWith$ is bounded by a number, which we call $l_{N_2}$. It stands
       
   135 for the number of distinct regular expressions smaller than $N_2$ (there can only be finitely many of them).
       
   136 We reason similarly for  $\STAR$.\medskip
       
   137 
       
   138 \noindent
       
   139 Clearly we give in this finiteness argument (Step (5)) a very loose bound that is
       
   140 far from the actual bound we can expect. We can do better than this, but this does not improve
       
   141 the finiteness property we are proving. If we are interested in a polynomial bound,
       
   142 one would hope to obtain a similar tight bound as for partial
       
   143 derivatives introduced by Antimirov \cite{Antimirov95}. After all the idea with
       
   144  $\distinctWith$ is to maintain a ``set'' of alternatives (like the sets in
       
   145 partial derivatives). Unfortunately to obtain the exact same bound would mean
       
   146 we need to introduce simplifications, such as
       
   147  $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$,
       
   148 which exist for partial derivatives. However, if we introduce them in our
       
   149 setting we would lose the POSIX property of our calculated values. We leave better
       
   150 bounds for future work.\\[-6.5mm]
       
   151 
    62 
   152 
    63 %----------------------------------------------------------------------------------------
   153 %----------------------------------------------------------------------------------------
    64 %	SECTION 2
   154 %	SECTION 2
    65 %----------------------------------------------------------------------------------------
   155 %----------------------------------------------------------------------------------------
    66 
   156 
    67 \section{Main Section 2}
   157 \section{"Closed Forms" of regular expressions' derivatives w.r.t strings}
       
   158 There is a nice property about the compound regular expression 
       
   159 $r_1 \cdot r_2$ in general,
       
   160 that the derivatives of it against a string $s$ can be described by
       
   161 the derivatives w.r.t $r_1$ and $r_2$ over substrings of $s$:
       
   162 \begin{lemma}
       
   163 $\textit{sflat}\_{aux} ( (r_1 \cdot r_2) \backslash s) = (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf s r_1))$
       
   164 \end{lemma}
       
   165 
       
   166 
       
   167 %----------------------------------------------------------------------------------------
       
   168 %	SECTION 3
       
   169 %----------------------------------------------------------------------------------------
       
   170 
       
   171 \section{interaction between $\distinctWith$ and $\flts$}
       
   172 Note that it is not immediately obvious that 
       
   173 $\llbracket \distinctWith (\flts \textit{rs}) = \phi \rrbracket   \leq \llbracket \distinctWith \textit{rs} = \phi \rrbracket  $
    68 
   174 
    69 Sed ullamcorper quam eu nisl interdum at interdum enim egestas. Aliquam placerat justo sed lectus lobortis ut porta nisl porttitor. Vestibulum mi dolor, lacinia molestie gravida at, tempus vitae ligula. Donec eget quam sapien, in viverra eros. Donec pellentesque justo a massa fringilla non vestibulum metus vestibulum. Vestibulum in orci quis felis tempor lacinia. Vivamus ornare ultrices facilisis. Ut hendrerit volutpat vulputate. Morbi condimentum venenatis augue, id porta ipsum vulputate in. Curabitur luctus tempus justo. Vestibulum risus lectus, adipiscing nec condimentum quis, condimentum nec nisl. Aliquam dictum sagittis velit sed iaculis. Morbi tristique augue sit amet nulla pulvinar id facilisis ligula mollis. Nam elit libero, tincidunt ut aliquam at, molestie in quam. Aenean rhoncus vehicula hendrerit.
   175 Sed ullamcorper quam eu nisl interdum at interdum enim egestas. Aliquam placerat justo sed lectus lobortis ut porta nisl porttitor. Vestibulum mi dolor, lacinia molestie gravida at, tempus vitae ligula. Donec eget quam sapien, in viverra eros. Donec pellentesque justo a massa fringilla non vestibulum metus vestibulum. Vestibulum in orci quis felis tempor lacinia. Vivamus ornare ultrices facilisis. Ut hendrerit volutpat vulputate. Morbi condimentum venenatis augue, id porta ipsum vulputate in. Curabitur luctus tempus justo. Vestibulum risus lectus, adipiscing nec condimentum quis, condimentum nec nisl. Aliquam dictum sagittis velit sed iaculis. Morbi tristique augue sit amet nulla pulvinar id facilisis ligula mollis. Nam elit libero, tincidunt ut aliquam at, molestie in quam. Aenean rhoncus vehicula hendrerit.