ChengsongTanPhdThesis/Chapters/Chapter2.tex
changeset 506 69ad05398894
parent 505 5ce3bd8e5696
child 507 213220f54a6e
equal deleted inserted replaced
505:5ce3bd8e5696 506:69ad05398894
   366 regexes of the form $\RALTS{rs}$, where $\textit{rs}$ might not contain precisely 2
   366 regexes of the form $\RALTS{rs}$, where $\textit{rs}$ might not contain precisely 2
   367 elements.
   367 elements.
   368 We give a predicate for such "star-created" regular expressions:
   368 We give a predicate for such "star-created" regular expressions:
   369 \begin{center}
   369 \begin{center}
   370 \begin{tabular}{lcr}
   370 \begin{tabular}{lcr}
   371          &    &       $\createdByStar{\RSEQ{ra}{\RSTAR{rb}}}$\\
   371          &    &       $\createdByStar{(\RSEQ{ra}{\RSTAR{rb}}) }$\\
   372  $\createdByStar{r_1} \land \createdByStar{r_2} $ & $ \Longrightarrow$ & $\createdByStar{(r_1 + r_2)}$
   372  $\createdByStar{r_1} \land \createdByStar{r_2} $ & $ \Longrightarrow$ & $\createdByStar{(r_1 + r_2)}$
   373  \end{tabular}
   373  \end{tabular}
   374  \end{center}
   374  \end{center}
       
   375  
       
   376  These definitions allows us the flexibility to talk about 
       
   377  regular expressions in their most convenient format,
       
   378  for example, flattened out $\RALTS{[r_1, r_2, \ldots, r_n]} $
       
   379  instead of binary-nested: $((r_1 + r_2) + (r_3 + r_4)) + \ldots$.
       
   380  These definitions help express that certain classes of syntatically 
       
   381  distinct regular expressions are actually the same under simplification.
       
   382  This is not entirely true for annotated regular expressions: 
       
   383  %TODO: bsimp bders \neq bderssimp
       
   384  \begin{center}
       
   385  $(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$
       
   386  \end{center}
       
   387  For bit-codes, the order in which simplification is applied
       
   388  might cause a difference in the location they are placed.
       
   389  If we want something like
       
   390  \begin{center}
       
   391  $\bderssimp{r}{s} \myequiv \bsimp{\bders{r}{s}}$
       
   392  \end{center}
       
   393  Some "canonicalization" procedure is required,
       
   394  which either pushes all the common bitcodes to nodes
       
   395   as senior as possible:
       
   396   \begin{center}
       
   397   $_{bs}(_{bs_1 @ bs'}r_1 + _{bs_1 @ bs''}r_2) \rightarrow _{bs @ bs_1}(_{bs'}r_1 + _{bs''}r_2) $
       
   398   \end{center}
       
   399  or does the reverse. However bitcodes are not of interest if we are talking about
       
   400  the $\llbracket r \rrbracket$ size of a regex.
       
   401  Therefore for the ease and simplicity of producing a
       
   402  proof for a size bound, we are happy to restrict ourselves to 
       
   403  unannotated regular expressions, and obtain such equalities as
       
   404  \begin{lemma}
       
   405  $\rsimp{(r_1 + r_2)} = \rsimp{\RALTS{\hflataux{r_1} @ \hflataux{r_2}}}$
       
   406  \end{lemma}
       
   407  
       
   408  \begin{proof}
       
   409  By using the rewriting relation $\rightsquigarrow$
       
   410  \end{proof}
       
   411  %TODO: rsimp sflat
       
   412 And from this we obtain a proof that a star's derivative will be the same
       
   413 as if it had all its nested alternatives created during deriving being flattened out:
       
   414  For example,
       
   415  \begin{lemma}
       
   416  $\createdByStar{r} \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$
       
   417  \end{lemma}
       
   418  \begin{proof}
       
   419  By structural induction on $r$, where the induction rules are these of $\createdByStar{_}$.
       
   420  \end{proof}
       
   421 % The simplification of a flattened out regular expression, provided it comes
       
   422 %from the derivative of a star, is the same as the one nested.
       
   423  
       
   424  
       
   425  
       
   426  
       
   427  
       
   428  
       
   429  
       
   430  
   375  
   431  
   376 One might wonder the actual bound rather than the loose bound we gave
   432 One might wonder the actual bound rather than the loose bound we gave
   377 for the convenience of a easier proof.
   433 for the convenience of a easier proof.
   378 How much can the regex $r^* \backslash s$ grow? As hinted earlier,
   434 How much can the regex $r^* \backslash s$ grow? As hinted earlier,
   379  they can grow at a maximum speed
   435  they can grow at a maximum speed
   380 of exponential $w.r.t$ the number of characters.
   436   exponential $w.r.t$ the number of characters, 
   381 But they will eventually level off when the string $s$ is long enough,
   437 but will eventually level off when the string $s$ is long enough,
   382 as suggested by the finiteness bound proof.
   438 as suggested by the finiteness bound proof.
   383 
   439 And unfortunately we have concrete examples
       
   440 where such regexes grew exponentially large before levelling off:
       
   441 $(a ^ * + (a + aa) ^ * + (a + aa + aaa) ^ * + \ldots + 
       
   442 (a+ aa + aaa+\ldots \underbrace{a \ldots a}_{\text{n a's}})^*$ will already have a maximum
       
   443 size that is  exponential on the number $n$.
   384 %TODO: give out a graph showing how the size of the regex grows and levels off at a size exponential to the original regex
   444 %TODO: give out a graph showing how the size of the regex grows and levels off at a size exponential to the original regex
   385 
   445 
   386 
   446 
       
   447 While the tight upper bound will definitely be a lot lower than 
       
   448 the one we gave for the finiteness proof, 
       
   449 it will still be at least expoential, under our current simplification rules.
       
   450 
       
   451 This suggests stronger simplifications that keep the tight bound polynomial.
       
   452 
       
   453 %----------------------------------------------------------------------------------------
       
   454 %	SECTION 5
       
   455 %----------------------------------------------------------------------------------------
       
   456 \section{a stronger version of simplification}
       
   457 %TODO: search for isabelle proofs of algorithms that check equivalence 
       
   458 
       
   459