539 add one more line which can be solved by sledgehammer |
539 add one more line which can be solved by sledgehammer |
540 to solve the $r^{\{n\}}$ inductive case. |
540 to solve the $r^{\{n\}}$ inductive case. |
541 |
541 |
542 |
542 |
543 \subsubsection{Finiteness Proof Augmentation} |
543 \subsubsection{Finiteness Proof Augmentation} |
544 The bounded repetitions can be handled |
544 The bounded repetitions are |
545 using techniques similar to stars. |
545 very similar to stars, and therefore the treatment |
|
546 is similar, with minor changes to handle some slight complications |
|
547 when the counter reaches 0. |
|
548 The exponential growth is similar: |
|
549 \begin{center} |
|
550 \begin{tabular}{ll} |
|
551 $r^{\{n\}} $ & $\longrightarrow_{\backslash c}$\\ |
|
552 $(r\backslash c) \cdot |
|
553 r^{\{n - 1\}}*$ & $\longrightarrow_{\backslash c'}$\\ |
|
554 \\ |
|
555 $r \backslash cc' \cdot r^{\{n - 2\}}* + |
|
556 r \backslash c' \cdot r^{\{n - 1\}}*$ & |
|
557 $\longrightarrow_{\backslash c''}$\\ |
|
558 \\ |
|
559 $(r_1 \backslash cc'c'' \cdot r^{\{n-3\}}* + |
|
560 r \backslash c''\cdot r^{\{n-1\}}) + |
|
561 (r \backslash c'c'' \cdot r^{\{n-2\}}* + |
|
562 r \backslash c'' \cdot r^{\{n-1\}}*)$ & |
|
563 $\longrightarrow_{\backslash c'''}$ \\ |
|
564 \\ |
|
565 $\ldots$\\ |
|
566 \end{tabular} |
|
567 \end{center} |
|
568 Again, we assume that $r\backslash c$, $r \backslash cc'$ and so on |
|
569 are all nullable. |
|
570 The flattened list of terms for $r^{\{n\}} \backslash_{rs} s$ |
|
571 \begin{center} |
|
572 $[r_1 \backslash cc'c'' \cdot r^{\{n-3\}}*,\; |
|
573 r \backslash c''\cdot r^{\{n-1\}}, \; |
|
574 r \backslash c'c'' \cdot r^{\{n-2\}}*, \; |
|
575 r \backslash c'' \cdot r^{\{n-1\}}*,\; \ldots ]$ |
|
576 \end{center} |
|
577 that comes from |
|
578 \begin{center} |
|
579 $(r_1 \backslash cc'c'' \cdot r^{\{n-3\}}* + |
|
580 r \backslash c''\cdot r^{\{n-1\}}) + |
|
581 (r \backslash c'c'' \cdot r^{\{n-2\}}* + |
|
582 r \backslash c'' \cdot r^{\{n-1\}}*)+ \ldots$ |
|
583 \end{center} |
|
584 are made of sequences with different tails, where the counters |
|
585 might differ. |
|
586 The observation for maintaining the bound is that |
|
587 these counters never exceed $n$, the original |
|
588 counter. With the number of counters staying finite, |
|
589 $\rDistinct$ will deduplicate and keep the list finite. |
|
590 We introduce this idea as a lemma once we describe all |
|
591 the necessary helper functions. |
|
592 |
|
593 Similar to the star case, we want |
|
594 \begin{center} |
|
595 $\rderssimp{r^{\{n\}}}{s} = \rsimp{\sum rs}$. |
|
596 \end{center} |
|
597 where $rs$ |
|
598 shall be in the form of |
|
599 $\map \; f \; Ss$, where $f$ is a function and |
|
600 $Ss$ a list of objects to act on. |
|
601 For star, the object's datatype is string. |
|
602 The list of strings $Ss$ |
|
603 is generated using functions |
|
604 $\starupdate$ and $\starupdates$. |
|
605 The function that takes a string and returns a regular expression |
|
606 is the anonymous function $ |
|
607 (\lambda s'. \; r\backslash s' \cdot r^{\{m\}})$. |
|
608 In the NTIMES setting, |
|
609 the $\starupdate$ and $\starupdates$ functions are replaced by |
|
610 $\textit{nupdate}$ and $\textit{nupdates}$: |
|
611 \begin{center} |
|
612 \begin{tabular}{lcl} |
|
613 $\nupdate \; c \; r \; [] $ & $\dn$ & $[]$\\ |
|
614 $\nupdate \; c \; r \; |
|
615 (\Some \; (s, \; n + 1) \; :: \; Ss)$ & $\dn$ & %\\ |
|
616 $\textit{if} \; |
|
617 (\rnullable \; (r \backslash_{rs} s))$ \\ |
|
618 & & $\;\;\textit{then} |
|
619 \;\; \Some \; (s @ [c], n + 1) :: \Some \; ([c], n) :: ( |
|
620 \nupdate \; c \; r \; Ss)$ \\ |
|
621 & & $\textit{else} \;\; \Some \; (s @ [c], n+1) :: ( |
|
622 \nupdate \; c \; r \; Ss)$\\ |
|
623 $\nupdate \; c \; r \; (\textit{None} :: Ss)$ & $\dn$ & |
|
624 $(\None :: \nupdate \; c \; r \; Ss)$\\ |
|
625 & & \\ |
|
626 %\end{tabular} |
|
627 %\end{center} |
|
628 %\begin{center} |
|
629 %\begin{tabular}{lcl} |
|
630 $\nupdates \; [] \; r \; Ss$ & $\dn$ & $Ss$\\ |
|
631 $\nupdates \; (c :: cs) \; r \; Ss$ & $\dn$ & $\nupdates \; cs \; r \; ( |
|
632 \nupdate \; c \; r \; Ss)$ |
|
633 \end{tabular} |
|
634 \end{center} |
|
635 \noindent |
|
636 which take into account when a subterm |
|
637 \begin{center} |
|
638 $r \backslash_s s \cdot r^{\{n\}}$ |
|
639 \end{center} |
|
640 counter $n$ |
|
641 is 0, and therefore expands to |
|
642 \begin{center} |
|
643 $r \backslash_s (s@[c]) \cdot r^{\{n\}} \;+ |
|
644 \; \ZERO$ |
|
645 \end{center} |
|
646 after taking a derivative. |
|
647 The object now has type |
|
648 \begin{center} |
|
649 $\textit{option} \;(\textit{string}, \textit{nat})$ |
|
650 \end{center} |
|
651 and therefore the function for converting such an option into |
|
652 a regular expression term is called $\opterm$: |
|
653 |
|
654 \begin{center} |
|
655 \begin{tabular}{lcl} |
|
656 $\opterm \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\ |
|
657 & & $\;\;\Some \; (s, n) \Rightarrow |
|
658 (r\backslash_{rs} s)\cdot r^{\{n\}}$\\ |
|
659 & & $\;\;\None \Rightarrow |
|
660 \ZERO$\\ |
|
661 \end{tabular} |
|
662 \end{center} |
|
663 \noindent |
|
664 Put together, the list $\map \; f \; Ss$ is instantiated as |
|
665 \begin{center} |
|
666 $\map \; (\opterm \; r) \; (\nupdates \; s \; r \; |
|
667 [\Some \; ([c], n)])$. |
|
668 \end{center} |
|
669 For the closed form to be bounded, we would like |
|
670 simplification to be applied to each term in the list. |
|
671 Therefore we introduce some variants of $\opterm$, |
|
672 which help conveniently express the rewriting steps |
|
673 needed in the closed form proof. |
|
674 \begin{center} |
|
675 \begin{tabular}{lcl} |
|
676 $\optermOsimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\ |
|
677 & & $\;\;\Some \; (s, n) \Rightarrow |
|
678 \textit{rsimp} \; ((r\backslash_{rs} s)\cdot r^{\{n\}})$\\ |
|
679 & & $\;\;\None \Rightarrow |
|
680 \ZERO$\\ |
|
681 \\ |
|
682 $\optermosimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\ |
|
683 & & $\;\;\Some \; (s, n) \Rightarrow |
|
684 (\textit{rsimp} \; (r\backslash_{rs} s)) |
|
685 \cdot r^{\{n\}}$\\ |
|
686 & & $\;\;\None \Rightarrow |
|
687 \ZERO$\\ |
|
688 \\ |
|
689 $\optermsimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\ |
|
690 & & $\;\;\Some \; (s, n) \Rightarrow |
|
691 (r\backslash_{rsimps} s)\cdot r^{\{n\}}$\\ |
|
692 & & $\;\;\None \Rightarrow |
|
693 \ZERO$\\ |
|
694 \end{tabular} |
|
695 \end{center} |
|
696 |
|
697 |
|
698 For a list of |
|
699 $\textit{option} \;(\textit{string}, \textit{nat})$ elements, |
|
700 we define the highest power for it recursively: |
|
701 \begin{center} |
|
702 \begin{tabular}{lcl} |
|
703 $\hpa \; [] \; n $ & $\dn$ & $n$\\ |
|
704 $\hpa \; (\None :: os) \; n $ & $\dn$ & $\hpa \; os \; n$\\ |
|
705 $\hpa \; (\Some \; (s, n) :: os) \; m$ & $\dn$ & |
|
706 $\hpa \;os \; (\textit{max} \; n\; m)$\\ |
|
707 \\ |
|
708 $\hpower \; rs $ & $\dn$ & $\hpa \; rs \; 0$\\ |
|
709 \end{tabular} |
|
710 \end{center} |
|
711 |
|
712 Now the intuition that an NTIMES regular expression's power |
|
713 does not increase can be easily expressed as |
|
714 \begin{lemma}\label{nupdatesMono2} |
|
715 $\hpower \; (\nupdates \;s \; r \; [\Some \; ([c], n)]) \leq n$ |
|
716 \end{lemma} |
|
717 \begin{proof} |
|
718 Note that the power is non-increasing after a $\nupdate$ application: |
|
719 \begin{center} |
|
720 $\hpa \;\; (\nupdate \; c \; r \; Ss)\;\; m \leq |
|
721 \hpa\; \; Ss \; m$. |
|
722 \end{center} |
|
723 This is also the case for $\nupdates$: |
|
724 \begin{center} |
|
725 $\hpa \;\; (\nupdates \; s \; r \; Ss)\;\; m \leq |
|
726 \hpa\; \; Ss \; m$. |
|
727 \end{center} |
|
728 Therefore we have that |
|
729 \begin{center} |
|
730 $\hpower \;\; (\nupdates \; s \; r \; Ss) \leq |
|
731 \hpower \;\; Ss$ |
|
732 \end{center} |
|
733 which leads to the lemma being proven. |
|
734 |
|
735 \end{proof} |
|
736 |
|
737 |
|
738 We also define the inductive rules for |
|
739 the shape of derivatives of the NTIMES regular expressions:\\[-3em] |
|
740 \begin{center} |
|
741 \begin{mathpar} |
|
742 \inferrule{\mbox{}}{\cbn \;\ZERO} |
|
743 |
|
744 \inferrule{\mbox{}}{\cbn \; \; r_a \cdot (r^{\{n\}})} |
|
745 |
|
746 \inferrule{\cbn \; r_1 \;\; \; \cbn \; r_2}{\cbn \; r_1 + r_2} |
|
747 |
|
748 \inferrule{\cbn \; r}{\cbn \; r + \ZERO} |
|
749 \end{mathpar} |
|
750 \end{center} |
|
751 \noindent |
|
752 A derivative of NTIMES fits into the shape described by $\cbn$: |
|
753 \begin{lemma}\label{ntimesDersCbn} |
|
754 $\cbn \; ((r' \cdot r^{\{n\}}) \backslash_{rs} s)$ holds. |
|
755 \end{lemma} |
|
756 \begin{proof} |
|
757 By a reverse induction on $s$. |
|
758 For the inductive case, note that if $\cbn \; r$ holds, |
|
759 then $\cbn \; (r\backslash_r c)$ holds. |
|
760 \end{proof} |
|
761 In addition, for $\cbn$-shaped regular expressioins, one can flatten |
|
762 them: |
|
763 \begin{lemma}\label{ntimesHfauPushin} |
|
764 If $\cbn \; r$ holds, then $\hflataux{r \backslash_r c} = |
|
765 \textit{concat} \; (\map \; \hflataux{\map \; (\_\backslash_r c) \; |
|
766 (\hflataux{r})})$ |
|
767 \end{lemma} |
|
768 \begin{proof} |
|
769 By an induction on the inductive cases of $\cbn$. |
|
770 \end{proof} |
|
771 |
|
772 |
|
773 |
|
774 This time we do not need to define the flattening functions for NTIMES only, |
|
775 because $\hflat{\_}$ and $\hflataux{\_}$ work on NTIMES already: |
|
776 \begin{lemma}\label{ntimesHfauInduct} |
|
777 $\hflataux{( (r\backslash_r c) \cdot r^{\{n\}}) \backslash_{rsimps} s} = |
|
778 \map \; (\opterm \; r) \; (\nupdates \; s \; r \; [\Some \; ([c], n)])$ |
|
779 \end{lemma} |
|
780 \begin{proof} |
|
781 By a reverse induction on $s$. |
|
782 The lemma \ref{ntimesDersCbn} is used. |
|
783 \end{proof} |
|
784 \noindent |
|
785 We have a recursive property for NTIMES with $\nupdate$ |
|
786 similar to that for STAR, |
|
787 and one for $\nupdates $ as well: |
|
788 \begin{lemma}\label{nupdateInduct1} |
|
789 (i) $\textit{concat} \; (\map \; (\hflataux{\_} \circ ( |
|
790 \opterm \; r)) \; Ss) = \map \; (\opterm \; r) \; (\nupdate \; |
|
791 c \; r \; Ss)$\\ |
|
792 (ii) $\textit{concat} \; (\map \; \hflataux{\_}\; |
|
793 \map \; (\_\backslash_r x) \; |
|
794 (\map \; (\opterm \; r) \; (\nupdates \; xs \; r \; Ss))) = |
|
795 \map \; (\opterm \; r) \; (\nupdates \;(xs@[x]) \; r\;Ss)$ holds. |
|
796 \end{lemma} |
|
797 \begin{proof} |
|
798 (i) is by an induction on $Ss$. |
|
799 (ii) is by an induction on $xs$. |
|
800 \end{proof} |
|
801 |
|
802 The $\nString$ predicate is defined for conveniently |
|
803 expressing that there are no empty strings in the |
|
804 $\Some \;(s, n)$ elements generated by $\nupdate$: |
|
805 \begin{center} |
|
806 \begin{tabular}{lcl} |
|
807 $\nString \; \None$ & $\dn$ & $ \textit{true}$\\ |
|
808 $\nString \; (\Some \; ([], n))$ & $\dn$ & $ \textit{false}$\\ |
|
809 $\nString \; (\Some \; (c::s, n))$ & $\dn$ & $ \textit{true}$\\ |
|
810 \end{tabular} |
|
811 \end{center} |
|
812 \begin{lemma}\label{nupdatesNonempty} |
|
813 If for all elements $opt$ in $\textit{set} \; Ss$, |
|
814 $\nString \; opt$ holds, the we have that |
|
815 for all elements $opt$ in $\textit{set} \; (\nupdates \; s \; r \; Ss)$, |
|
816 $\nString \; opt$ holds. |
|
817 \end{lemma} |
|
818 \begin{proof} |
|
819 By an induction on $s$, where $Ss$ is set to vary over all possible values. |
|
820 \end{proof} |
|
821 |
|
822 \noindent |
|
823 |
|
824 \begin{lemma}\label{ntimesClosedFormsSteps} |
|
825 The following list of equalities or rewriting relations hold:\\ |
|
826 (i) $r^{\{n+1\}} \backslash_{rsimps} (c::s) = |
|
827 \textit{rsimp} \; (\sum (\map \; (\opterm \;r \;\_) \; (\nupdates \; |
|
828 s \; r \; [\Some \; ([c], n)])))$\\ |
|
829 (ii) |
|
830 \begin{center} |
|
831 $\sum (\map \; (\opterm \; r) \; (\nupdates \; s \; r \; [ |
|
832 \Some \; ([c], n)]))$ \\ $ \sequal$\\ |
|
833 $\sum (\map \; (\textit{rsimp} \circ (\opterm \; r))\; (\nupdates \; |
|
834 s\;r \; [\Some \; ([c], n)]))$\\ |
|
835 \end{center} |
|
836 (iii) |
|
837 \begin{center} |
|
838 $\sum \;(\map \; (\optermosimp \; r) \; (\nupdates \; s \; r\; [\Some \; |
|
839 ([c], n)]))$\\ |
|
840 $\sequal$\\ |
|
841 $\sum \;(\map \; (\optermsimp r) \; (\nupdates \; s \; r \; [\Some \; |
|
842 ([c], n)])) $\\ |
|
843 \end{center} |
|
844 (iv) |
|
845 \begin{center} |
|
846 $\sum \;(\map \; (\optermosimp \; r) \; (\nupdates \; s \; r\; [\Some \; |
|
847 ([c], n)])) $ \\ $\sequal$\\ |
|
848 $\sum \;(\map \; (\optermOsimp r) \; (\nupdates \; s \; r \; [\Some \; |
|
849 ([c], n)])) $\\ |
|
850 \end{center} |
|
851 (v) |
|
852 \begin{center} |
|
853 $\sum \;(\map \; (\optermOsimp r) \; (\nupdates \; s \; r \; [\Some \; |
|
854 ([c], n)])) $ \\ $\sequal$\\ |
|
855 $\sum \; (\map \; (\textit{rsimp} \circ (\opterm \; r)) \; |
|
856 (\nupdates \; s \; r \; [\Some \; ([c], n)]))$ |
|
857 \end{center} |
|
858 \end{lemma} |
|
859 \begin{proof} |
|
860 Routine. |
|
861 (iii) and (iv) make use of the fact that all the strings $s$ |
|
862 inside $\Some \; (s, m)$ which are elements of the list |
|
863 $\nupdates \; s\;r\;[\Some\; ([c], n)]$ are non-empty, |
|
864 which is from lemma \ref{nupdatesNonempty}. |
|
865 Once the string in $o = \Some \; (s, n)$ is |
|
866 nonempty, $\optermsimp \; r \;o$, |
|
867 $\optermosimp \; r \; o$ and $\optermosimp \; \; o$ are guaranteed |
|
868 to be equal. |
|
869 \end{proof} |
|
870 |
|
871 \begin{theorem}\label{ntimesClosedForm} |
|
872 The derivative of $r^{\{n+1\}}$ can be described as an alternative |
|
873 containing a list |
|
874 of terms:\\ |
|
875 $r^{\{n+1\}} \backslash_{rsimps} (c::s) = \textit{rsimp} \; ( |
|
876 \sum (\map \; (\optermsimp \; r) \; (\nupdates \; s \; r \; |
|
877 [\Some \; ([c], n)])))$ |
|
878 \end{theorem} |
|
879 \begin{proof} |
|
880 By the rewriting steps described in lemma \ref{ntimesClosedFormsSteps}. |
|
881 \end{proof} |
|
882 |
|
883 \begin{lemma}\label{nupdatesNLeqN} |
|
884 For an element $o$ in $\textit{set} \; (\nupdates \; s \; r \; |
|
885 [\Some \; ([c], n)])$, either $o = \None$, or $o = \Some |
|
886 \; (s', m)$ for some string $s'$ and number $m \leq n$. |
|
887 \end{lemma} |
|
888 |
|
889 |
|
890 \begin{lemma}\label{ntimesClosedFormListElemShape} |
|
891 For any element $r'$ in $\textit{set} \; (\map \; (\optermsimp \; r) \; ( |
|
892 \nupdates \; s \; r \; [\Some \; ([c], n)]))$, |
|
893 we have that $r'$ is either $\ZERO$ or $r \backslash_{rsimps} s' \cdot |
|
894 r^{\{m\}}$ for some string $s'$ and number $m \leq n$. |
|
895 \end{lemma} |
|
896 \begin{proof} |
|
897 Using lemma \ref{nupdatesNLeqN}. |
|
898 \end{proof} |
|
899 |
|
900 \begin{theorem}\label{ntimesClosedFormBounded} |
|
901 Assuming that for any string $s$, $\llbracket r \backslash_{rsimps} s |
|
902 \rrbracket_r \leq N$ holds, then we have that\\ |
|
903 $\llbracket r^{\{n+1\}} \backslash_{rsimps} s \rrbracket_r \leq |
|
904 \textit{max} \; (c_N+1)* (N + \llbracket r^{\{n\}} \rrbracket+1)$, |
|
905 where $c_N = \textit{card} \; (\textit{sizeNregex} \; ( |
|
906 N + \llbracket r^{\{n\}} \rrbracket_r+1))$. |
|
907 \end{theorem} |
|
908 \begin{proof} |
|
909 We have that for all regular expressions $r'$ in $\textit{set} \; (\map \; (\optermsimp \; r) \; ( |
|
910 \nupdates \; s \; r \; [\Some \; ([c], n)]))$, |
|
911 $r'$'s size is less than or equal to $N + \llbracket r^{\{n\}} |
|
912 \rrbracket_r + 1$ |
|
913 because $r'$ can only be either a $\ZERO$ or $r \backslash_{rsimps} s' \cdot |
|
914 r^{\{m\}}$ for some string $s'$ and number |
|
915 $m \leq n$ (lemma \ref{ntimesClosedFormListElemShape}). |
|
916 In addition, we know that the list |
|
917 $\map \; (\optermsimp \; r) \; ( |
|
918 \nupdates \; s \; r \; [\Some \; ([c], n)])$'s size is at most |
|
919 $c_N = \textit{card} \; |
|
920 (\sizeNregex \; ((N + \llbracket r^{\{n\}} \rrbracket) + 1))$. |
|
921 This gives us $\llbracket \llbracket r \backslash_{rsimps} \;s \llbracket_r |
|
922 \leq N * c_N$. |
|
923 \end{proof} |
|
924 |
|
925 |
|
926 Assuming we have that |
|
927 \begin{center} |
|
928 $\rderssimp{r^*}{s} = \rsimp{(\sum \map \; (\lambda s'. r\backslash s' \cdot r^*) \; Ss)}$. |
|
929 \end{center} |
|
930 holds. |
|
931 The idea of $\starupdate$ and $\starupdates$ |
|
932 is to update $Ss$ when another |
|
933 derivative is taken on $\rderssimp{r^*}{s}$ |
|
934 w.r.t a character $c$ and a string $s'$ |
|
935 respectively. |
|
936 Both $\starupdate$ and $\starupdates$ take three arguments as input: |
|
937 the new character $c$ or string $s$ to take derivative with, |
|
938 the regular expression |
|
939 $r$ under the star $r^*$, and the |
|
940 list of strings $Ss$ for the derivative $r^* \backslash s$ |
|
941 up until this point |
|
942 such that |
|
943 \begin{center} |
|
944 $(r^*) \backslash s = \sum_{s' \in sSet} (r\backslash s') \cdot r^*$ |
|
945 \end{center} |
|
946 is satisfied. |
|
947 |
|
948 Functions $\starupdate$ and $\starupdates$ characterise what the |
|
949 star derivatives will look like once ``straightened out'' into lists. |
|
950 The helper functions for such operations will be similar to |
|
951 $\sflat{\_}$, $\sflataux{\_}$ and $\sflataux{\_}$, which we defined for sequence. |
|
952 We use similar symbols to denote them, with a $*$ subscript to mark the difference. |
|
953 \begin{center} |
|
954 \begin{tabular}{lcl} |
|
955 $\hflataux{r_1 + r_2}$ & $\dn$ & $\hflataux{r_1} @ \hflataux{r_2}$\\ |
|
956 $\hflataux{r}$ & $\dn$ & $[r]$ |
|
957 \end{tabular} |
|
958 \end{center} |
|
959 |
|
960 \begin{center} |
|
961 \begin{tabular}{lcl} |
|
962 $\hflat{r_1 + r_2}$ & $\dn$ & $\sum (\hflataux {r_1} @ \hflataux {r_2}) $\\ |
|
963 $\hflat{r}$ & $\dn$ & $r$ |
|
964 \end{tabular} |
|
965 \end{center} |
|
966 \noindent |
|
967 These definitions are tailor-made for dealing with alternatives that have |
|
968 originated from a star's derivatives. |
|
969 A typical star derivative always has the structure of a balanced binary tree: |
|
970 \begin{center} |
|
971 $(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + |
|
972 (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $ |
|
973 \end{center} |
|
974 All of the nested structures of alternatives |
|
975 generated from derivatives are binary, and therefore |
|
976 $\hflat{\_}$ and $\hflataux{\_}$ only deal with binary alternatives. |
|
977 $\hflat{\_}$ ``untangles'' like the following: |
|
978 \[ |
|
979 \sum ((r_1 + r_2) + (r_3 + r_4)) + \ldots \; |
|
980 \stackrel{\hflat{\_}}{\longrightarrow} \; |
|
981 \RALTS{[r_1, r_2, \ldots, r_n]} |
|
982 \] |
|
983 Here is a lemma stating the recursive property of $\starupdate$ and $\starupdates$, |
|
984 with the helpers $\hflat{\_}$ and $\hflataux{\_}$\footnote{The function $\textit{concat}$ takes a list of lists |
|
985 and merges each of the element lists to form a flattened list.}: |
|
986 \begin{lemma}\label{stupdateInduct1} |
|
987 \mbox |
|
988 For a list of strings $Ss$, the following hold. |
|
989 \begin{itemize} |
|
990 \item |
|
991 If we do a derivative on the terms |
|
992 $r\backslash_r s \cdot r^*$ (where $s$ is taken from the list $Ss$), |
|
993 the result will be the same as if we apply $\starupdate$ to $Ss$. |
|
994 \begin{center} |
|
995 \begin{tabular}{c} |
|
996 $\textit{concat} \; (\map \; (\hflataux{\_} \circ ( (\_\backslash_r x) |
|
997 \circ (\lambda s.\;\; (r \backslash_r s) \cdot r^*)))\; Ss )\; |
|
998 $\\ |
|
999 \\ |
|
1000 $=$ \\ |
|
1001 \\ |
|
1002 $\map \; (\lambda s. (r \backslash_r s) \cdot (r^*)) \; |
|
1003 (\starupdate \; x \; r \; Ss)$ |
|
1004 \end{tabular} |
|
1005 \end{center} |
|
1006 \item |
|
1007 $\starupdates$ is ``composable'' w.r.t a derivative. |
|
1008 It piggybacks the character $x$ to the tail of the string |
|
1009 $xs$. |
|
1010 \begin{center} |
|
1011 \begin{tabular}{c} |
|
1012 $\textit{concat} \; (\map \; \hflataux{\_} \; |
|
1013 (\map \; (\_\backslash_r x) \; |
|
1014 (\map \; (\lambda s.\;\; (r \backslash_r s) \cdot |
|
1015 (r^*) ) \; (\starupdates \; xs \; r \; Ss))))$\\ |
|
1016 \\ |
|
1017 $=$\\ |
|
1018 \\ |
|
1019 $\map \; (\lambda s.\;\; (r\backslash_r s) \cdot (r^*)) \; |
|
1020 (\starupdates \; (xs @ [x]) \; r \; Ss)$ |
|
1021 \end{tabular} |
|
1022 \end{center} |
|
1023 \end{itemize} |
|
1024 \end{lemma} |
|
1025 |
|
1026 \begin{proof} |
|
1027 Part 1 is by induction on $Ss$. |
|
1028 Part 2 is by induction on $xs$, where $Ss$ is left to take arbitrary values. |
|
1029 \end{proof} |
|
1030 |
|
1031 |
|
1032 Like $\textit{createdBySequence}$, we need |
|
1033 a predicate for ``star-created'' regular expressions: |
|
1034 \begin{center} |
|
1035 \begin{mathpar} |
|
1036 \inferrule{\mbox{}}{ \textit{createdByStar}\; \RSEQ{ra}{\RSTAR{rb}} } |
|
1037 |
|
1038 \inferrule{ \textit{createdByStar} \; r_1\; \land \; \textit{createdByStar} \; r_2 }{\textit{createdByStar} \; (r_1 + r_2) } |
|
1039 \end{mathpar} |
|
1040 \end{center} |
|
1041 \noindent |
|
1042 All regular expressions created by taking derivatives of |
|
1043 $r_1 \cdot (r_2)^*$ satisfy the $\textit{createdByStar}$ predicate: |
|
1044 \begin{lemma}\label{starDersCbs} |
|
1045 $\textit{createdByStar} \; ((r_1 \cdot r_2^*) \backslash_r s) $ holds. |
|
1046 \end{lemma} |
|
1047 \begin{proof} |
|
1048 By a reverse induction on $s$. |
|
1049 \end{proof} |
|
1050 If a regular expression conforms to the shape of a star's derivative, |
|
1051 then we can push an application of $\hflataux{\_}$ inside a derivative of it: |
|
1052 \begin{lemma}\label{hfauPushin} |
|
1053 If $\textit{createdByStar} \; r$ holds, then |
|
1054 \begin{center} |
|
1055 $\hflataux{r \backslash_r c} = \textit{concat} \; ( |
|
1056 \map \; \hflataux{\_} (\map \; (\_\backslash_r c) \;(\hflataux{r})))$ |
|
1057 \end{center} |
|
1058 holds. |
|
1059 \end{lemma} |
|
1060 \begin{proof} |
|
1061 By an induction on the inductivev cases of $\textit{createdByStar}$. |
|
1062 \end{proof} |
|
1063 %This is not entirely true for annotated regular expressions: |
|
1064 %%TODO: bsimp bders \neq bderssimp |
|
1065 %\begin{center} |
|
1066 % $(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$ |
|
1067 %\end{center} |
|
1068 %For bit-codes, the order in which simplification is applied |
|
1069 %might cause a difference in the location they are placed. |
|
1070 %If we want something like |
|
1071 %\begin{center} |
|
1072 % $\bderssimp{r}{s} \myequiv \bsimp{\bders{r}{s}}$ |
|
1073 %\end{center} |
|
1074 %Some "canonicalization" procedure is required, |
|
1075 %which either pushes all the common bitcodes to nodes |
|
1076 %as senior as possible: |
|
1077 %\begin{center} |
|
1078 % $_{bs}(_{bs_1 @ bs'}r_1 + _{bs_1 @ bs''}r_2) \rightarrow _{bs @ bs_1}(_{bs'}r_1 + _{bs''}r_2) $ |
|
1079 %\end{center} |
|
1080 %or does the reverse. However bitcodes are not of interest if we are talking about |
|
1081 %the $\llbracket r \rrbracket$ size of a regex. |
|
1082 %Therefore for the ease and simplicity of producing a |
|
1083 %proof for a size bound, we are happy to restrict ourselves to |
|
1084 %unannotated regular expressions, and obtain such equalities as |
|
1085 %TODO: rsimp sflat |
|
1086 % The simplification of a flattened out regular expression, provided it comes |
|
1087 %from the derivative of a star, is the same as the one nested. |
|
1088 |
|
1089 |
|
1090 |
|
1091 Now we introduce an inductive property |
|
1092 for $\starupdate$ and $\hflataux{\_}$. |
|
1093 \begin{lemma}\label{starHfauInduct} |
|
1094 If we do derivatives of $r^*$ |
|
1095 with a string that starts with $c$, |
|
1096 then flatten it out, |
|
1097 we obtain a list |
|
1098 of the shape $\sum_{s' \in sS} (r\backslash_r s') \cdot r^*$, |
|
1099 where $sS = \starupdates \; s \; r \; [[c]]$. Namely, |
|
1100 \begin{center} |
|
1101 $\hflataux{(\rders{( (\rder{c}{r_0})\cdot(r_0^*))}{s})} = |
|
1102 \map \; (\lambda s_1. (r_0 \backslash_r s_1) \cdot (r_0^*)) \; |
|
1103 (\starupdates \; s \; r_0 \; [[c]])$ |
|
1104 \end{center} |
|
1105 holds. |
|
1106 \end{lemma} |
|
1107 \begin{proof} |
|
1108 By an induction on $s$, the inductive cases |
|
1109 being $[]$ and $s@[c]$. The lemmas \ref{hfauPushin} and \ref{starDersCbs} are used. |
|
1110 \end{proof} |
|
1111 \noindent |
|
1112 |
|
1113 $\hflataux{\_}$ has a similar effect as $\textit{flatten}$: |
|
1114 \begin{lemma}\label{hflatauxGrewrites} |
|
1115 $a :: rs \grewrites \hflataux{a} @ rs$ |
|
1116 \end{lemma} |
|
1117 \begin{proof} |
|
1118 By induction on $a$. $rs$ is set to take arbitrary values. |
|
1119 \end{proof} |
|
1120 It is also not surprising that $\textit{rsimp}$ will cover |
|
1121 the effect of $\hflataux{\_}$: |
|
1122 \begin{lemma}\label{cbsHfauRsimpeq1} |
|
1123 $\rsimp{(r_1 + r_2)} = \rsimp{(\RALTS{\hflataux{r_1} @ \hflataux{r_2}})}$ |
|
1124 \end{lemma} |
|
1125 |
|
1126 \begin{proof} |
|
1127 By using the rewriting relation $\rightsquigarrow$ |
|
1128 \end{proof} |
|
1129 And from this we obtain a proof that a star's derivative will be the same |
|
1130 as if it had all its nested alternatives created during deriving being flattened out: |
|
1131 For example, |
|
1132 \begin{lemma}\label{hfauRsimpeq2} |
|
1133 $\createdByStar{r} \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$ |
|
1134 \end{lemma} |
|
1135 \begin{proof} |
|
1136 By structural induction on $r$, where the induction rules |
|
1137 are these of $\createdByStar{\_}$. |
|
1138 Lemma \ref{cbsHfauRsimpeq1} is used in the inductive case. |
|
1139 \end{proof} |
|
1140 |
|
1141 |
|
1142 %Here is a corollary that states the lemma in |
|
1143 %a more intuitive way: |
|
1144 %\begin{corollary} |
|
1145 % $\hflataux{r^* \backslash_r (c::xs)} = \map \; (\lambda s. (r \backslash_r s) \cdot |
|
1146 % (r^*))\; (\starupdates \; c\; r\; [[c]])$ |
|
1147 %\end{corollary} |
|
1148 %\noindent |
|
1149 %Note that this is also agnostic of the simplification |
|
1150 %function we defined, and is therefore of more general interest. |
|
1151 |
|
1152 Together with the rewriting relation |
|
1153 \begin{lemma}\label{starClosedForm6Hrewrites} |
|
1154 We have the following set of rewriting relations or equalities: |
|
1155 \begin{itemize} |
|
1156 \item |
|
1157 $\textit{rsimp} \; (r^* \backslash_r (c::s)) |
|
1158 \sequal |
|
1159 \sum \; ( ( \sum (\lambda s. (r\backslash_r s) \cdot r^*) \; ( |
|
1160 \starupdates \; s \; r \; [ c::[]] ) ) )$ |
|
1161 \item |
|
1162 $r \backslash_{rsimp} (c::s) = \textit{rsimp} \; ( ( |
|
1163 \sum ( (\map \; (\lambda s_1. (r\backslash s_1) \; r^*) \; |
|
1164 (\starupdates \;s \; r \; [ c::[] ])))))$ |
|
1165 \item |
|
1166 $\sum ( (\map \; (\lambda s. (r\backslash s) \; r^*) \; Ss)) |
|
1167 \sequal |
|
1168 \sum ( (\map \; (\lambda s. \textit{rsimp} \; (r\backslash s) \; |
|
1169 r^*) \; Ss) )$ |
|
1170 \item |
|
1171 $\map \; (\lambda s. (\rsimp{r \backslash_r s}) \cdot (r^*)) \; Ss |
|
1172 \scfrewrites |
|
1173 \map \; (\lambda s. (\rsimp{r \backslash_r s}) \cdot (r^*)) \; Ss$ |
|
1174 \item |
|
1175 $( ( \sum ( ( \map \ (\lambda s. \;\; |
|
1176 (\textit{rsimp} \; (r \backslash_r s)) \cdot r^*) \; (\starupdates \; |
|
1177 s \; r \; [ c::[] ])))))$\\ |
|
1178 $\sequal$\\ |
|
1179 $( ( \sum ( ( \map \ (\lambda s. \;\; |
|
1180 ( r \backslash_{rsimp} s)) \cdot r^*) \; (\starupdates \; |
|
1181 s \; r \; [ c::[] ]))))$\\ |
|
1182 \end{itemize} |
|
1183 \end{lemma} |
|
1184 \begin{proof} |
|
1185 Part 1 leads to part 2. |
|
1186 The rest of them are routine. |
|
1187 \end{proof} |
|
1188 \noindent |
|
1189 Next the closed form for star regular expressions can be derived: |
|
1190 \begin{theorem}\label{starClosedForm} |
|
1191 $\rderssimp{r^*}{c::s} = |
|
1192 \rsimp{ |
|
1193 (\sum (\map \; (\lambda s. (\rderssimp{r}{s})\cdot r^*) \; |
|
1194 (\starupdates \; s\; r \; [[c]]) |
|
1195 ) |
|
1196 ) |
|
1197 } |
|
1198 $ |
|
1199 \end{theorem} |
|
1200 \begin{proof} |
|
1201 By an induction on $s$. |
|
1202 The lemmas \ref{rsimpIdem}, \ref{starHfauInduct}, \ref{starClosedForm6Hrewrites} |
|
1203 and \ref{hfauRsimpeq2} |
|
1204 are used. |
|
1205 In \ref{starClosedForm6Hrewrites}, the equalities are |
|
1206 used to link the LHS and RHS. |
|
1207 \end{proof} |
|
1208 |
|
1209 |
|
1210 |
|
1211 |
546 The closed form for them looks like: |
1212 The closed form for them looks like: |
547 %\begin{center} |
1213 %\begin{center} |
548 % \begin{tabular}{llrclll} |
1214 % \begin{tabular}{llrclll} |
549 % $r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & \\ |
1215 % $r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & \\ |
550 % $\textit{rsimp}$ & $($ & $ |
1216 % $\textit{rsimp}$ & $($ & $ |