324 The next property gives the condition for |
337 The next property gives the condition for |
325 when $\rdistinct{\_}{\_}$ becomes an identical mapping |
338 when $\rdistinct{\_}{\_}$ becomes an identical mapping |
326 for any prefix of an input list, in other words, when can |
339 for any prefix of an input list, in other words, when can |
327 we ``push out" the arguments of $\rdistinct{\_}{\_}$: |
340 we ``push out" the arguments of $\rdistinct{\_}{\_}$: |
328 \begin{lemma} |
341 \begin{lemma} |
329 If $rs_1$'s elements are all unique, and not appearing in the accumulator $acc$, |
342 If $\textit{isDistinct} \; rs_1$, and $rs_1 \cap acc = \varnothing$, |
330 then it can be taken out and left untouched in the output: |
343 then it can be taken out and left untouched in the output: |
331 \[\textit{rdistinct}\; (rs_1 @ rsa)\;\, acc |
344 \[\textit{rdistinct}\; (rs_1 @ rsa)\;\, acc |
332 = rs_1@(\textit{rdistinct} rsa \; (acc \cup rs_1))\] |
345 = rs_1@(\textit{rdistinct} rsa \; (acc \cup rs_1))\] |
333 \end{lemma} |
346 \end{lemma} |
334 |
347 \noindent |
|
348 The predicate $\textit{isDistinct}$ is for testing |
|
349 whether a list's elements are all unique. It is defined |
|
350 recursively on the structure of a regular expression, |
|
351 and we omit the precise definition here. |
335 \begin{proof} |
352 \begin{proof} |
336 By an induction on $rs_1$, where $rsa$ and $acc$ are allowed to be arbitrary. |
353 By an induction on $rs_1$, where $rsa$ and $acc$ are allowed to be arbitrary. |
337 \end{proof} |
354 \end{proof} |
|
355 \noindent |
|
356 $\rdistinct{}{}$ removes any element in anywhere of a list, if it |
|
357 had appeared previously: |
|
358 \begin{lemma}\label{distinctRemovesMiddle} |
|
359 The two properties hold if $r \in rs$: |
|
360 \begin{itemize} |
|
361 \item |
|
362 $\rdistinct{rs}{rset} = \rdistinct{(rs @ [r])}{rset}$ |
|
363 and |
|
364 $\rdistinct{(ab :: rs @ [ab])}{rset'} = \rdistinct{(ab :: rs)}{rset'}$ |
|
365 \item |
|
366 $\rdistinct{ (rs @ rs') }{rset} = \rdistinct{rs @ [r] @ rs'}{rset}$ |
|
367 and |
|
368 $\rdistinct{(ab :: rs @ [ab] @ rs'')}{rset'} = |
|
369 \rdistinct{(ab :: rs @ rs'')}{rset'}$ |
|
370 \end{itemize} |
|
371 \end{lemma} |
|
372 \noindent |
|
373 \begin{proof} |
|
374 By induction on $rs$. All other variables are allowed to be arbitrary. |
|
375 The second half of the lemma requires the first half. |
|
376 Note that for each half's two sub-propositions need to be proven concurrently, |
|
377 so that the induction goes through. |
|
378 \end{proof} |
|
379 |
338 \subsubsection{The Properties of $\backslash_r$, $\backslash_{rsimp}$, $\textit{Rflts}$ and $\textit{Rsimp}_{ALTS}$} |
380 \subsubsection{The Properties of $\backslash_r$, $\backslash_{rsimp}$, $\textit{Rflts}$ and $\textit{Rsimp}_{ALTS}$} |
339 We give in this subsection some properties of how $\backslash_r$, $\backslash_{rsimp}$, $\textit{Rflts}$ and $\textit{Rsimp}_{ALTS} $ interact with each other and with $@$, the concatenation operator. |
381 We give in this subsection some properties of how $\backslash_r$, $\backslash_{rsimp}$, $\textit{Rflts}$ and $\textit{Rsimp}_{ALTS} $ interact with each other and with $@$, the concatenation operator. |
340 These will be helpful in later closed form proofs, when |
382 These will be helpful in later closed form proofs, when |
341 we want to transform the ways in which multiple functions involving |
383 we want to transform the ways in which multiple functions involving |
342 those are composed together |
384 those are composed together |
343 in interleaving derivative and simplification steps. |
385 in interleaving derivative and simplification steps. |
344 |
386 |
345 When the function $\textit{Rflts}$ |
387 When the function $\textit{Rflts}$ |
346 is applied to the concatenation of two lists, the output can be calculated by first applying the |
388 is applied to the concatenation of two lists, the output can be calculated by first applying the |
347 functions on two lists separately, and then concatenating them together. |
389 functions on two lists separately, and then concatenating them together. |
348 \begin{lemma} |
390 \begin{lemma}\label{rfltsProps} |
349 The function $\rflts$ has the below properties:\\ |
391 The function $\rflts$ has the below properties:\\ |
350 \begin{itemize} |
392 \begin{itemize} |
351 \item |
393 \item |
352 $\rflts \; (rs_1 @ rs_2) = \rflts \; rs_1 @ \rflts \; rs_2$ |
394 $\rflts \; (rs_1 @ rs_2) = \rflts \; rs_1 @ \rflts \; rs_2$ |
353 \item |
395 \item |
354 If $r \neq \RZERO$ and $\nexists rs_1. r = \RALTS{rs}_1$, then $\rflts \; (r::rs) = r :: \rflts \; rs$ |
396 If $r \neq \RZERO$ and $\nexists rs_1. r = \RALTS{rs}_1$, then $\rflts \; (r::rs) = r :: \rflts \; rs$ |
|
397 \item |
|
398 $\rflts \; (rs @ [\RZERO]) = \rflts \; rs$ |
|
399 \item |
|
400 $\rflts \; (rs' @ [\RALTS{rs}]) = \rflts \; rs'@rs$ |
|
401 \item |
|
402 $\rflts \; (rs @ [\RONE]) = \rflts \; rs @ [\RONE]$ |
|
403 \item |
|
404 If $r \neq \RZERO$ and $\nexists rs'. r = \RALTS{rs'}$ then $\rflts \; (rs @ [r]) |
|
405 = (\rflts \; rs) @ [r]$ |
355 \end{itemize} |
406 \end{itemize} |
356 \end{lemma} |
407 \end{lemma} |
357 \noindent |
408 \noindent |
358 \begin{proof} |
409 \begin{proof} |
359 By induction on $rs_1$. |
410 By induction on $rs_1$ in the first part, and induction on $r$ in the second part, |
360 \end{proof} |
411 and induction on $rs$, $rs'$ in the third and fourth sub-lemma. |
361 |
412 \end{proof} |
362 |
413 \subsubsection{The $RL$ Function: Language Interpretation of $\textit{Rrexp}$s} |
|
414 Much like the definition of $L$ on plain regular expressions, one could also |
|
415 define the language interpretation of $\rrexp$s. |
|
416 \begin{center} |
|
417 \begin{tabular}{lcl} |
|
418 $RL \; (\ZERO)$ & $\dn$ & $\phi$\\ |
|
419 $RL \; (\ONE)$ & $\dn$ & $\{[]\}$\\ |
|
420 $RL \; (c)$ & $\dn$ & $\{[c]\}$\\ |
|
421 $RL \; \sum rs$ & $\dn$ & $ \bigcup_{r \in rs} (RL \; r)$\\ |
|
422 $RL \; (r_1 \cdot r_2)$ & $\dn$ & $ RL \; (r_1) @ RL \; (r_2)$\\ |
|
423 $RL \; (r^*)$ & $\dn$ & $ (RL(r))^*$ |
|
424 \end{tabular} |
|
425 \end{center} |
|
426 \noindent |
|
427 The main use of $RL$ is to establish some connections between $\rsimp{}$ |
|
428 and $\rnullable{}$: |
|
429 \begin{lemma} |
|
430 The following properties hold: |
|
431 \begin{itemize} |
|
432 \item |
|
433 If $\rnullable{r}$, then $\rsimp{r} \neq \RZERO$. |
|
434 \item |
|
435 $\rnullable{r \backslash s} \quad $ if and only if $\quad \rnullable{\rderssimp{r}{s}}$. |
|
436 \end{itemize} |
|
437 \end{lemma} |
|
438 \begin{proof} |
|
439 The first part is by induction on $r$. |
|
440 The second part is true because property |
|
441 \[ RL \; r = RL \; (\rsimp{r})\] holds. |
|
442 \end{proof} |
|
443 |
|
444 \subsubsection{$\rsimp{}$ is Non-Increasing} |
|
445 In this subsection, we prove that the function $\rsimp{}$ does not |
|
446 make the $\llbracket \rrbracket_r$ size increase. |
|
447 |
|
448 |
|
449 \begin{lemma}\label{rsimpSize} |
|
450 $\llbracket \rsimp{r} \rrbracket_r \leq \llbracket r \rrbracket_r$ |
|
451 \end{lemma} |
|
452 \subsubsection{Simplified $\textit{Rrexp}$s are Good} |
|
453 We formalise the notion of ``good" regular expressions, |
|
454 which means regular expressions that |
|
455 are not fully simplified. For alternative regular expressions that means they |
|
456 do not contain any nested alternatives like |
|
457 \[ r_1 + (r_2 + r_3) \], un-removed $\RZERO$s like \[\RZERO + r\] |
|
458 or duplicate elements in a children regular expression list like \[ \sum [r, r, \ldots]\]: |
|
459 \begin{center} |
|
460 \begin{tabular}{@{}lcl@{}} |
|
461 $\good\; \RZERO$ & $\dn$ & $\textit{false}$\\ |
|
462 $\good\; \RONE$ & $\dn$ & $\textit{true}$\\ |
|
463 $\good\; \RCHAR{c}$ & $\dn$ & $\btrue$\\ |
|
464 $\good\; \RALTS{[]}$ & $\dn$ & $\bfalse$\\ |
|
465 $\good\; \RALTS{[r]}$ & $\dn$ & $\bfalse$\\ |
|
466 $\good\; \RALTS{r_1 :: r_2 :: rs}$ & $\dn$ & |
|
467 $\textit{isDistinct} \; (r_1 :: r_2 :: rs) \;$\\ |
|
468 & & $\textit{and}\; (\forall r' \in (r_1 :: r_2 :: rs).\; \good \; r'\; \, \textit{and}\; \, \textit{nonAlt}\; r')$\\ |
|
469 $\good \; \RSEQ{\RZERO}{r}$ & $\dn$ & $\bfalse$\\ |
|
470 $\good \; \RSEQ{\RONE}{r}$ & $\dn$ & $\bfalse$\\ |
|
471 $\good \; \RSEQ{r}{\RZERO}$ & $\dn$ & $\bfalse$\\ |
|
472 $\good \; \RSEQ{r_1}{r_2}$ & $\dn$ & $\good \; r_1 \;\, \textit{and} \;\, \good \; r_2$\\ |
|
473 $\good \; \RSTAR{r}$ & $\dn$ & $\btrue$\\ |
|
474 \end{tabular} |
|
475 \end{center} |
|
476 \noindent |
|
477 The predicate $\textit{nonAlt}$ evaluates to true when the regular expression is not an |
|
478 alternative, and false otherwise. |
|
479 The $\good$ property is preserved under $\rsimp_{ALTS}$, provided that |
|
480 its non-empty argument list of expressions are all good themsleves, and $\textit{nonAlt}$, |
|
481 and unique: |
|
482 \begin{lemma}\label{rsimpaltsGood} |
|
483 If $rs \neq []$ and forall $r \in rs. \textit{nonAlt} \; r$ and $\textit{isDistinct} \; rs$, |
|
484 then $\good \; (\rsimpalts \; rs)$ if and only if forall $r \in rs. \; \good \; r$. |
|
485 \end{lemma} |
|
486 \noindent |
|
487 We also note that |
|
488 if a regular expression $r$ is good, then $\rflts$ on the singleton |
|
489 list $[r]$ will not break goodness: |
|
490 \begin{lemma}\label{flts2} |
|
491 If $\good \; r$, then forall $r' \in \rflts \; [r]. \; \good \; r'$ and $\textit{nonAlt} \; r'$. |
|
492 \end{lemma} |
|
493 \begin{proof} |
|
494 By an induction on $r$. |
|
495 \end{proof} |
|
496 \noindent |
|
497 The other observation we make about $\rsimp{r}$ is that it never |
|
498 comes with nested alternatives, which we describe as the $\nonnested$ |
|
499 property: |
|
500 \begin{center} |
|
501 \begin{tabular}{lcl} |
|
502 $\nonnested \; \, \sum []$ & $\dn$ & $\btrue$\\ |
|
503 $\nonnested \; \, \sum ((\sum rs_1) :: rs_2)$ & $\dn$ & $\bfalse$\\ |
|
504 $\nonnested \; \, \sum (r :: rs)$ & $\dn$ & $\nonnested (\sum rs)$\\ |
|
505 $\nonnested \; \, r $ & $\dn$ & $\btrue$ |
|
506 \end{tabular} |
|
507 \end{center} |
|
508 \noindent |
|
509 The $\rflts$ function |
|
510 always opens up nested alternatives, |
|
511 which enables $\rsimp$ to be non-nested: |
|
512 |
|
513 \begin{lemma}\label{nonnestedRsimp} |
|
514 $\nonnested \; (\rsimp{r})$ |
|
515 \end{lemma} |
|
516 \begin{proof} |
|
517 By an induction on $r$. |
|
518 \end{proof} |
|
519 \noindent |
|
520 With this we could prove that a regular expressions |
|
521 after simplification and flattening and de-duplication, |
|
522 will not contain any alternative regular expression directly: |
|
523 \begin{lemma}\label{nonaltFltsRd} |
|
524 If $x \in \rdistinct{\rflts\; (\map \; \rsimp{} \; rs)}{\varnothing}$ |
|
525 then $\textit{nonAlt} \; x$. |
|
526 \end{lemma} |
|
527 \begin{proof} |
|
528 By \ref{nonnestedRsimp}. |
|
529 \end{proof} |
|
530 \noindent |
|
531 The other thing we know is that once $\rsimp{}$ had finished |
|
532 processing an alternative regular expression, it will not |
|
533 contain any $\RZERO$s, this is because all the recursive |
|
534 calls to the simplification on the children regular expressions |
|
535 make the children good, and $\rflts$ will not take out |
|
536 any $\RZERO$s out of a good regular expression list, |
|
537 and $\rdistinct{}$ will not mess with the result. |
|
538 \begin{lemma}\label{flts3Obv} |
|
539 The following are true: |
|
540 \begin{itemize} |
|
541 \item |
|
542 If for all $r \in rs. \, \good \; r $ or $r = \RZERO$, |
|
543 then for all $r \in \rflts\; rs. \, \good \; r$. |
|
544 \item |
|
545 If $x \in \rdistinct{\rflts\; (\map \; rsimp{}\; rs)}{\varnothing}$ |
|
546 and for all $y$ such that $\llbracket y \rrbracket_r$ less than |
|
547 $\llbracket rs \rrbracket_r + 1$, either |
|
548 $\good \; (\rsimp{y})$ or $\rsimp{y} = \RZERO$, |
|
549 then $\good \; x$. |
|
550 \end{itemize} |
|
551 \end{lemma} |
|
552 \begin{proof} |
|
553 The first part is by induction on $rs$, where the induction |
|
554 rule is the inductive cases for $\rflts$. |
|
555 The second part is a corollary from the first part. |
|
556 \end{proof} |
|
557 |
|
558 And this leads to good structural property of $\rsimp{}$, |
|
559 that after simplification, a regular expression is |
|
560 either good or $\RZERO$: |
|
561 \begin{lemma}\label{good1} |
|
562 For any r-regular expression $r$, $\good \; \rsimp{r}$ or $\rsimp{r} = \RZERO$. |
|
563 \end{lemma} |
|
564 \begin{proof} |
|
565 By an induction on $r$. The inductive measure is the size $\llbracket \rrbracket_r$. |
|
566 Lemma \ref{rsimpSize} says that |
|
567 $\llbracket \rsimp{r}\rrbracket_r$ is smaller than or equal to |
|
568 $\llbracket r \rrbracket_r$. |
|
569 Therefore, in the $r_1 \cdot r_2$ and $\sum rs$ case, |
|
570 Inductive hypothesis applies to the children regular expressions |
|
571 $r_1$, $r_2$, etc. The lemma \ref{flts3Obv}'s precondition is satisfied |
|
572 by that as well. |
|
573 The lemmas \ref{nonnestedRsimp} and \ref{nonaltFltsRd} are used |
|
574 to ensure that goodness is preserved at the topmost level. |
|
575 \end{proof} |
|
576 We shall prove that any good regular expression is |
|
577 a fixed-point for $\rsimp{}$. |
|
578 First we prove an auxiliary lemma: |
|
579 \begin{lemma}\label{goodaltsNonalt} |
|
580 If $\good \; \sum rs$, then $\rflts\; rs = rs$. |
|
581 \end{lemma} |
|
582 \begin{proof} |
|
583 By an induction on $\sum rs$. The inductive rules are the cases |
|
584 for $\good$. |
|
585 \end{proof} |
|
586 \noindent |
|
587 Now we are ready to prove that good regular expressions are invariant |
|
588 of $\rsimp{}$ application: |
|
589 \begin{lemma}\label{test} |
|
590 If $\good \;r$ then $\rsimp{r} = r$. |
|
591 \end{lemma} |
|
592 \begin{proof} |
|
593 By an induction on the inductive cases of $\good$. |
|
594 The lemma \ref{goodaltsNonalt} is used in the alternative |
|
595 case where 2 or more elements are present in the list. |
|
596 \end{proof} |
|
597 |
|
598 \subsubsection{$\rsimp$ is Idempotent} |
|
599 The idempotency of $\rsimp$ is very useful in |
|
600 manipulating regular expression terms into desired |
|
601 forms so that key steps allowing further rewriting to closed forms |
|
602 are possible. |
|
603 \begin{lemma}\label{rsimpIdem} |
|
604 $\rsimp{r} = \rsimp{\rsimp{r}}$ |
|
605 \end{lemma} |
|
606 |
|
607 \begin{proof} |
|
608 By \ref{test} and \ref{good1}. |
|
609 \end{proof} |
|
610 \noindent |
|
611 This property means we do not have to repeatedly |
|
612 apply simplification in each step, which justifies |
|
613 our definition of $\blexersimp$. |
|
614 |
|
615 |
|
616 On the other hand, we could repeat the same $\rsimp{}$ applications |
|
617 on regular expressions as many times as we want, if we have at least |
|
618 one simplification applied to it, and apply it wherever we would like to: |
|
619 \begin{corollary}\label{headOneMoreSimp} |
|
620 $\map \; \rsimp{(r :: rs)} = \map \; \rsimp{} \; (\rsimp{r} :: rs)$ |
|
621 \end{corollary} |
|
622 \noindent |
|
623 This will be useful in later closed form proof's rewriting steps. |
|
624 Similarly, we point out the following useful facts below: |
|
625 \begin{lemma} |
|
626 The following equalities hold if $r = \rsimp{r'}$ for some $r'$: |
|
627 \begin{itemize} |
|
628 \item |
|
629 If $r = \sum rs$ then $\rsimpalts \; rs = \sum rs$. |
|
630 \item |
|
631 If $r = \sum rs$ then $\rdistinct{rs}{\varnothing} = rs$. |
|
632 \item |
|
633 $\rsimpalts \; (\rdistinct{\rflts \; [r]}{\varnothing}) = r$. |
|
634 \end{itemize} |
|
635 \end{lemma} |
|
636 \begin{proof} |
|
637 By application of \ref{rsimpIdem} and \ref{good1}. |
|
638 \end{proof} |
|
639 |
|
640 \noindent |
|
641 With the idempotency of $\rsimp{}$ and its corollaries, |
|
642 we can start proving some key equalities leading to the |
|
643 closed forms. |
|
644 Now presented are a few equivalent terms under $\rsimp{}$. |
|
645 We use $r_1 \sequal r_2 $ here to denote $\rsimp{r_1} = \rsimp{r_2}$. |
|
646 \begin{lemma} |
|
647 \begin{itemize} |
|
648 \item |
|
649 $\rsimpalts \; (\RZERO :: rs) \sequal \rsimpalts\; rs$ |
|
650 \item |
|
651 $\rsimpalts \; rs \sequal \rsimpalts (\map \; \rsimp{} \; rs)$ |
|
652 \item |
|
653 $\RALTS{\RALTS{rs}} \sequal \RALTS{rs}$ |
|
654 \end{itemize} |
|
655 \end{lemma} |
|
656 \noindent |
|
657 We need more equalities like the above to enable a closed form, |
|
658 but to proceed we need to introduce two rewrite relations, |
|
659 to make things smoother. |
|
660 \subsubsection{The rewrite relation $\hrewrite$ and $\grewrite$} |
|
661 Insired by the success we had in the correctness proof |
|
662 in \ref{Bitcoded2}, where we invented |
|
663 a term rewriting system to capture the similarity between terms |
|
664 and prove equivalence, we follow suit here defining simplification |
|
665 steps as rewriting steps. |
|
666 The presentation will be more concise than that in \ref{Bitcoded2}. |
|
667 To differentiate between the rewriting steps for annotated regular expressions |
|
668 and $\rrexp$s, we add characters $h$ and $g$ below the squig arrow symbol |
|
669 to mean atomic simplification transitions |
|
670 of $\rrexp$s and $\rrexp$ lists, respectively. |
|
671 |
|
672 \begin{center} |
|
673 |
|
674 List of 1-step rewrite rules for regular expressions simplification without bitcodes: |
|
675 \begin{figure} |
|
676 \caption{the "h-rewrite" rules} |
|
677 \[ |
|
678 r_1 \cdot \ZERO \rightarrow_h \ZERO \] |
|
679 |
|
680 \[ |
|
681 \ZERO \cdot r_2 \rightarrow \ZERO |
|
682 \] |
|
683 \end{figure} |
|
684 And we define an "grewrite" relation that works on lists: |
|
685 \begin{center} |
|
686 \begin{tabular}{lcl} |
|
687 $ \ZERO :: rs$ & $\rightarrow_g$ & $rs$ |
|
688 \end{tabular} |
|
689 \end{center} |
|
690 |
|
691 |
|
692 |
|
693 With these relations we prove |
|
694 \begin{lemma} |
|
695 $rs \rightarrow rs' \implies \RALTS{rs} \rightarrow \RALTS{rs'}$ |
|
696 \end{lemma} |
|
697 which enables us to prove "closed forms" equalities such as |
|
698 \begin{lemma} |
|
699 $\sflat{(r_1 \cdot r_2) \backslash s} = \RALTS{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\suffix \; s \; r_1 ))}$ |
|
700 \end{lemma} |
|
701 |
|
702 But the most involved part of the above lemma is proving the following: |
|
703 \begin{lemma} |
|
704 $\bsimp{\RALTS{rs @ \RALTS{rs_1} @ rs'}} = \bsimp{\RALTS{rs @rs_1 @ rs'}} $ |
|
705 \end{lemma} |
|
706 which is needed in proving things like |
|
707 \begin{lemma} |
|
708 $r \rightarrow_f r' \implies \rsimp{r} \rightarrow \rsimp{r'}$ |
|
709 \end{lemma} |
|
710 |
|
711 Fortunately this is provable by induction on the list $rs$ |
363 |
712 |
364 \subsection{A Closed Form for the Sequence Regular Expression} |
713 \subsection{A Closed Form for the Sequence Regular Expression} |
365 \begin{quote}\it |
714 \begin{quote}\it |
366 Claim: For regular expressions $r_1 \cdot r_2$, we claim that |
715 Claim: For regular expressions $r_1 \cdot r_2$, we claim that |
367 \begin{center} |
716 \begin{center} |