366 regexes of the form $\RALTS{rs}$, where $\textit{rs}$ might not contain precisely 2 |
366 regexes of the form $\RALTS{rs}$, where $\textit{rs}$ might not contain precisely 2 |
367 elements. |
367 elements. |
368 We give a predicate for such "star-created" regular expressions: |
368 We give a predicate for such "star-created" regular expressions: |
369 \begin{center} |
369 \begin{center} |
370 \begin{tabular}{lcr} |
370 \begin{tabular}{lcr} |
371 & & $\createdByStar{\RSEQ{ra}{\RSTAR{rb}}}$\\ |
371 & & $\createdByStar{(\RSEQ{ra}{\RSTAR{rb}}) }$\\ |
372 $\createdByStar{r_1} \land \createdByStar{r_2} $ & $ \Longrightarrow$ & $\createdByStar{(r_1 + r_2)}$ |
372 $\createdByStar{r_1} \land \createdByStar{r_2} $ & $ \Longrightarrow$ & $\createdByStar{(r_1 + r_2)}$ |
373 \end{tabular} |
373 \end{tabular} |
374 \end{center} |
374 \end{center} |
|
375 |
|
376 These definitions allows us the flexibility to talk about |
|
377 regular expressions in their most convenient format, |
|
378 for example, flattened out $\RALTS{[r_1, r_2, \ldots, r_n]} $ |
|
379 instead of binary-nested: $((r_1 + r_2) + (r_3 + r_4)) + \ldots$. |
|
380 These definitions help express that certain classes of syntatically |
|
381 distinct regular expressions are actually the same under simplification. |
|
382 This is not entirely true for annotated regular expressions: |
|
383 %TODO: bsimp bders \neq bderssimp |
|
384 \begin{center} |
|
385 $(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$ |
|
386 \end{center} |
|
387 For bit-codes, the order in which simplification is applied |
|
388 might cause a difference in the location they are placed. |
|
389 If we want something like |
|
390 \begin{center} |
|
391 $\bderssimp{r}{s} \myequiv \bsimp{\bders{r}{s}}$ |
|
392 \end{center} |
|
393 Some "canonicalization" procedure is required, |
|
394 which either pushes all the common bitcodes to nodes |
|
395 as senior as possible: |
|
396 \begin{center} |
|
397 $_{bs}(_{bs_1 @ bs'}r_1 + _{bs_1 @ bs''}r_2) \rightarrow _{bs @ bs_1}(_{bs'}r_1 + _{bs''}r_2) $ |
|
398 \end{center} |
|
399 or does the reverse. However bitcodes are not of interest if we are talking about |
|
400 the $\llbracket r \rrbracket$ size of a regex. |
|
401 Therefore for the ease and simplicity of producing a |
|
402 proof for a size bound, we are happy to restrict ourselves to |
|
403 unannotated regular expressions, and obtain such equalities as |
|
404 \begin{lemma} |
|
405 $\rsimp{(r_1 + r_2)} = \rsimp{\RALTS{\hflataux{r_1} @ \hflataux{r_2}}}$ |
|
406 \end{lemma} |
|
407 |
|
408 \begin{proof} |
|
409 By using the rewriting relation $\rightsquigarrow$ |
|
410 \end{proof} |
|
411 %TODO: rsimp sflat |
|
412 And from this we obtain a proof that a star's derivative will be the same |
|
413 as if it had all its nested alternatives created during deriving being flattened out: |
|
414 For example, |
|
415 \begin{lemma} |
|
416 $\createdByStar{r} \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$ |
|
417 \end{lemma} |
|
418 \begin{proof} |
|
419 By structural induction on $r$, where the induction rules are these of $\createdByStar{_}$. |
|
420 \end{proof} |
|
421 % The simplification of a flattened out regular expression, provided it comes |
|
422 %from the derivative of a star, is the same as the one nested. |
|
423 |
|
424 |
|
425 |
|
426 |
|
427 |
|
428 |
|
429 |
|
430 |
375 |
431 |
376 One might wonder the actual bound rather than the loose bound we gave |
432 One might wonder the actual bound rather than the loose bound we gave |
377 for the convenience of a easier proof. |
433 for the convenience of a easier proof. |
378 How much can the regex $r^* \backslash s$ grow? As hinted earlier, |
434 How much can the regex $r^* \backslash s$ grow? As hinted earlier, |
379 they can grow at a maximum speed |
435 they can grow at a maximum speed |
380 of exponential $w.r.t$ the number of characters. |
436 exponential $w.r.t$ the number of characters, |
381 But they will eventually level off when the string $s$ is long enough, |
437 but will eventually level off when the string $s$ is long enough, |
382 as suggested by the finiteness bound proof. |
438 as suggested by the finiteness bound proof. |
383 |
439 And unfortunately we have concrete examples |
|
440 where such regexes grew exponentially large before levelling off: |
|
441 $(a ^ * + (a + aa) ^ * + (a + aa + aaa) ^ * + \ldots + |
|
442 (a+ aa + aaa+\ldots \underbrace{a \ldots a}_{\text{n a's}})^*$ will already have a maximum |
|
443 size that is exponential on the number $n$. |
384 %TODO: give out a graph showing how the size of the regex grows and levels off at a size exponential to the original regex |
444 %TODO: give out a graph showing how the size of the regex grows and levels off at a size exponential to the original regex |
385 |
445 |
386 |
446 |
|
447 While the tight upper bound will definitely be a lot lower than |
|
448 the one we gave for the finiteness proof, |
|
449 it will still be at least expoential, under our current simplification rules. |
|
450 |
|
451 This suggests stronger simplifications that keep the tight bound polynomial. |
|
452 |
|
453 %---------------------------------------------------------------------------------------- |
|
454 % SECTION 5 |
|
455 %---------------------------------------------------------------------------------------- |
|
456 \section{a stronger version of simplification} |
|
457 %TODO: search for isabelle proofs of algorithms that check equivalence |
|
458 |
|
459 |