592 |
589 |
593 |
590 |
594 %---------------------------------------------------------------------------------------- |
591 %---------------------------------------------------------------------------------------- |
595 % SECTION 2 |
592 % SECTION 2 |
596 %---------------------------------------------------------------------------------------- |
593 %---------------------------------------------------------------------------------------- |
597 |
|
598 \section{Bounded Repetitions} |
|
599 We have promised in chapter \ref{Introduction} |
|
600 that our lexing algorithm can potentially be extended |
|
601 to handle bounded repetitions |
|
602 in natural and elegant ways. |
|
603 Now we fulfill our promise by adding support for |
|
604 the ``exactly-$n$-times'' bounded regular expression $r^{\{n\}}$. |
|
605 We add clauses in our derivatives-based lexing algorithms (with simplifications) |
|
606 introduced in chapter \ref{Bitcoded2}. |
|
607 |
|
608 \subsection{Augmented Definitions} |
|
609 There are a number of definitions that need to be augmented. |
|
610 The most notable one would be the POSIX rules for $r^{\{n\}}$: |
|
611 \begin{center} |
|
612 \begin{mathpar} |
|
613 \inferrule{\forall v \in vs_1. \vdash v:r \land |
|
614 |v| \neq []\\ \forall v \in vs_2. \vdash v:r \land |v| = []\\ |
|
615 \textit{length} \; (vs_1 @ vs_2) = n}{\textit{Stars} \; |
|
616 (vs_1 @ vs_2) : r^{\{n\}} } |
|
617 \end{mathpar} |
|
618 \end{center} |
|
619 As Ausaf had pointed out \cite{Ausaf}, |
|
620 sometimes empty iterations have to be taken to get |
|
621 a match with exactly $n$ repetitions, |
|
622 and hence the $vs_2$ part. |
|
623 |
|
624 Another important definition would be the size: |
|
625 \begin{center} |
|
626 \begin{tabular}{lcl} |
|
627 $\llbracket r^{\{n\}} \rrbracket_r$ & $\dn$ & |
|
628 $\llbracket r \rrbracket_r + n$\\ |
|
629 \end{tabular} |
|
630 \end{center} |
|
631 \noindent |
|
632 Arguably we should use $\log \; n$ for the size because |
|
633 the number of digits increase logarithmically w.r.t $n$. |
|
634 For simplicity we choose to add the counter directly to the size. |
|
635 |
|
636 The derivative w.r.t a bounded regular expression |
|
637 is given as |
|
638 \begin{center} |
|
639 \begin{tabular}{lcl} |
|
640 $r^{\{n\}} \backslash_r c$ & $\dn$ & |
|
641 $r\backslash_r c \cdot r^{\{n-1\}} \;\; \textit{if} \; n \geq 1$\\ |
|
642 & & $\RZERO \;\quad \quad\quad \quad |
|
643 \textit{otherwise}$\\ |
|
644 \end{tabular} |
|
645 \end{center} |
|
646 \noindent |
|
647 For brevity, we sometimes use NTIMES to refer to bounded |
|
648 regular expressions. |
|
649 The $\mkeps$ function clause for NTIMES would be |
|
650 \begin{center} |
|
651 \begin{tabular}{lcl} |
|
652 $\mkeps \; r^{\{n\}} $ & $\dn$ & $\Stars \; |
|
653 (\textit{replicate} \; n\; (\mkeps \; r))$ |
|
654 \end{tabular} |
|
655 \end{center} |
|
656 \noindent |
|
657 The injection looks like |
|
658 \begin{center} |
|
659 \begin{tabular}{lcl} |
|
660 $\inj \; r^{\{n\}} \; c\; (\Seq \;v \; (\Stars \; vs)) $ & |
|
661 $\dn$ & $\Stars \; |
|
662 ((\inj \; r \;c \;v ) :: vs)$ |
|
663 \end{tabular} |
|
664 \end{center} |
|
665 \noindent |
|
666 |
|
667 |
|
668 \subsection{Proofs for the Augmented Lexing Algorithm} |
|
669 We need to maintain two proofs with the additional $r^{\{n\}}$ |
|
670 construct: the |
|
671 correctness proof in chapter \ref{Bitcoded2}, |
|
672 and the finiteness proof in chapter \ref{Finite}. |
|
673 |
|
674 \subsubsection{Correctness Proof Augmentation} |
|
675 The correctness of $\textit{lexer}$ and $\textit{blexer}$ with bounded repetitions |
|
676 have been proven by Ausaf and Urban\cite{AusafDyckhoffUrban2016}. |
|
677 As they have commented, once the definitions are in place, |
|
678 the proofs given for the basic regular expressions will extend to |
|
679 bounded regular expressions, and there are no ``surprises''. |
|
680 We confirm this point because the correctness theorem would also |
|
681 extend without surprise to $\blexersimp$. |
|
682 The rewrite rules such as $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$ and so on |
|
683 do not need to be changed, |
|
684 and only a few lemmas such as lemma \ref{fltsPreserves} need to be adjusted to |
|
685 add one more line which can be solved by sledgehammer |
|
686 to solve the $r^{\{n\}}$ inductive case. |
|
687 |
|
688 |
|
689 \subsubsection{Finiteness Proof Augmentation} |
|
690 The bounded repetitions are |
|
691 very similar to stars, and therefore the treatment |
|
692 is similar, with minor changes to handle some slight complications |
|
693 when the counter reaches 0. |
|
694 The exponential growth is similar: |
|
695 \begin{center} |
|
696 \begin{tabular}{ll} |
|
697 $r^{\{n\}} $ & $\longrightarrow_{\backslash c}$\\ |
|
698 $(r\backslash c) \cdot |
|
699 r^{\{n - 1\}}*$ & $\longrightarrow_{\backslash c'}$\\ |
|
700 \\ |
|
701 $r \backslash cc' \cdot r^{\{n - 2\}}* + |
|
702 r \backslash c' \cdot r^{\{n - 1\}}*$ & |
|
703 $\longrightarrow_{\backslash c''}$\\ |
|
704 \\ |
|
705 $(r_1 \backslash cc'c'' \cdot r^{\{n-3\}}* + |
|
706 r \backslash c''\cdot r^{\{n-1\}}) + |
|
707 (r \backslash c'c'' \cdot r^{\{n-2\}}* + |
|
708 r \backslash c'' \cdot r^{\{n-1\}}*)$ & |
|
709 $\longrightarrow_{\backslash c'''}$ \\ |
|
710 \\ |
|
711 $\ldots$\\ |
|
712 \end{tabular} |
|
713 \end{center} |
|
714 Again, we assume that $r\backslash c$, $r \backslash cc'$ and so on |
|
715 are all nullable. |
|
716 The flattened list of terms for $r^{\{n\}} \backslash_{rs} s$ |
|
717 \begin{center} |
|
718 $[r_1 \backslash cc'c'' \cdot r^{\{n-3\}}*,\; |
|
719 r \backslash c''\cdot r^{\{n-1\}}, \; |
|
720 r \backslash c'c'' \cdot r^{\{n-2\}}*, \; |
|
721 r \backslash c'' \cdot r^{\{n-1\}}*,\; \ldots ]$ |
|
722 \end{center} |
|
723 that comes from |
|
724 \begin{center} |
|
725 $(r_1 \backslash cc'c'' \cdot r^{\{n-3\}}* + |
|
726 r \backslash c''\cdot r^{\{n-1\}}) + |
|
727 (r \backslash c'c'' \cdot r^{\{n-2\}}* + |
|
728 r \backslash c'' \cdot r^{\{n-1\}}*)+ \ldots$ |
|
729 \end{center} |
|
730 are made of sequences with different tails, where the counters |
|
731 might differ. |
|
732 The observation for maintaining the bound is that |
|
733 these counters never exceed $n$, the original |
|
734 counter. With the number of counters staying finite, |
|
735 $\rDistinct$ will deduplicate and keep the list finite. |
|
736 We introduce this idea as a lemma once we describe all |
|
737 the necessary helper functions. |
|
738 |
|
739 Similar to the star case, we want |
|
740 \begin{center} |
|
741 $\rderssimp{r^{\{n\}}}{s} = \rsimp{\sum rs}$. |
|
742 \end{center} |
|
743 where $rs$ |
|
744 shall be in the form of |
|
745 $\map \; f \; Ss$, where $f$ is a function and |
|
746 $Ss$ a list of objects to act on. |
|
747 For star, the object's datatype is string. |
|
748 The list of strings $Ss$ |
|
749 is generated using functions |
|
750 $\starupdate$ and $\starupdates$. |
|
751 The function that takes a string and returns a regular expression |
|
752 is the anonymous function $ |
|
753 (\lambda s'. \; r\backslash s' \cdot r^{\{m\}})$. |
|
754 In the NTIMES setting, |
|
755 the $\starupdate$ and $\starupdates$ functions are replaced by |
|
756 $\textit{nupdate}$ and $\textit{nupdates}$: |
|
757 \begin{center} |
|
758 \begin{tabular}{lcl} |
|
759 $\nupdate \; c \; r \; [] $ & $\dn$ & $[]$\\ |
|
760 $\nupdate \; c \; r \; |
|
761 (\Some \; (s, \; n + 1) \; :: \; Ss)$ & $\dn$ & %\\ |
|
762 $\textit{if} \; |
|
763 (\rnullable \; (r \backslash_{rs} s))$ \\ |
|
764 & & $\;\;\textit{then} |
|
765 \;\; \Some \; (s @ [c], n + 1) :: \Some \; ([c], n) :: ( |
|
766 \nupdate \; c \; r \; Ss)$ \\ |
|
767 & & $\textit{else} \;\; \Some \; (s @ [c], n+1) :: ( |
|
768 \nupdate \; c \; r \; Ss)$\\ |
|
769 $\nupdate \; c \; r \; (\textit{None} :: Ss)$ & $\dn$ & |
|
770 $(\None :: \nupdate \; c \; r \; Ss)$\\ |
|
771 & & \\ |
|
772 %\end{tabular} |
|
773 %\end{center} |
|
774 %\begin{center} |
|
775 %\begin{tabular}{lcl} |
|
776 $\nupdates \; [] \; r \; Ss$ & $\dn$ & $Ss$\\ |
|
777 $\nupdates \; (c :: cs) \; r \; Ss$ & $\dn$ & $\nupdates \; cs \; r \; ( |
|
778 \nupdate \; c \; r \; Ss)$ |
|
779 \end{tabular} |
|
780 \end{center} |
|
781 \noindent |
|
782 which take into account when a subterm |
|
783 \begin{center} |
|
784 $r \backslash_s s \cdot r^{\{n\}}$ |
|
785 \end{center} |
|
786 counter $n$ |
|
787 is 0, and therefore expands to |
|
788 \begin{center} |
|
789 $r \backslash_s (s@[c]) \cdot r^{\{n\}} \;+ |
|
790 \; \ZERO$ |
|
791 \end{center} |
|
792 after taking a derivative. |
|
793 The object now has type |
|
794 \begin{center} |
|
795 $\textit{option} \;(\textit{string}, \textit{nat})$ |
|
796 \end{center} |
|
797 and therefore the function for converting such an option into |
|
798 a regular expression term is called $\opterm$: |
|
799 |
|
800 \begin{center} |
|
801 \begin{tabular}{lcl} |
|
802 $\opterm \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\ |
|
803 & & $\;\;\Some \; (s, n) \Rightarrow |
|
804 (r\backslash_{rs} s)\cdot r^{\{n\}}$\\ |
|
805 & & $\;\;\None \Rightarrow |
|
806 \ZERO$\\ |
|
807 \end{tabular} |
|
808 \end{center} |
|
809 \noindent |
|
810 Put together, the list $\map \; f \; Ss$ is instantiated as |
|
811 \begin{center} |
|
812 $\map \; (\opterm \; r) \; (\nupdates \; s \; r \; |
|
813 [\Some \; ([c], n)])$. |
|
814 \end{center} |
|
815 For the closed form to be bounded, we would like |
|
816 simplification to be applied to each term in the list. |
|
817 Therefore we introduce some variants of $\opterm$, |
|
818 which help conveniently express the rewriting steps |
|
819 needed in the closed form proof. |
|
820 \begin{center} |
|
821 \begin{tabular}{lcl} |
|
822 $\optermOsimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\ |
|
823 & & $\;\;\Some \; (s, n) \Rightarrow |
|
824 \textit{rsimp} \; ((r\backslash_{rs} s)\cdot r^{\{n\}})$\\ |
|
825 & & $\;\;\None \Rightarrow |
|
826 \ZERO$\\ |
|
827 \\ |
|
828 $\optermosimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\ |
|
829 & & $\;\;\Some \; (s, n) \Rightarrow |
|
830 (\textit{rsimp} \; (r\backslash_{rs} s)) |
|
831 \cdot r^{\{n\}}$\\ |
|
832 & & $\;\;\None \Rightarrow |
|
833 \ZERO$\\ |
|
834 \\ |
|
835 $\optermsimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\ |
|
836 & & $\;\;\Some \; (s, n) \Rightarrow |
|
837 (r\backslash_{rsimps} s)\cdot r^{\{n\}}$\\ |
|
838 & & $\;\;\None \Rightarrow |
|
839 \ZERO$\\ |
|
840 \end{tabular} |
|
841 \end{center} |
|
842 |
|
843 |
|
844 For a list of |
|
845 $\textit{option} \;(\textit{string}, \textit{nat})$ elements, |
|
846 we define the highest power for it recursively: |
|
847 \begin{center} |
|
848 \begin{tabular}{lcl} |
|
849 $\hpa \; [] \; n $ & $\dn$ & $n$\\ |
|
850 $\hpa \; (\None :: os) \; n $ & $\dn$ & $\hpa \; os \; n$\\ |
|
851 $\hpa \; (\Some \; (s, n) :: os) \; m$ & $\dn$ & |
|
852 $\hpa \;os \; (\textit{max} \; n\; m)$\\ |
|
853 \\ |
|
854 $\hpower \; rs $ & $\dn$ & $\hpa \; rs \; 0$\\ |
|
855 \end{tabular} |
|
856 \end{center} |
|
857 |
|
858 Now the intuition that an NTIMES regular expression's power |
|
859 does not increase can be easily expressed as |
|
860 \begin{lemma}\label{nupdatesMono2} |
|
861 $\hpower \; (\nupdates \;s \; r \; [\Some \; ([c], n)]) \leq n$ |
|
862 \end{lemma} |
|
863 \begin{proof} |
|
864 Note that the power is non-increasing after a $\nupdate$ application: |
|
865 \begin{center} |
|
866 $\hpa \;\; (\nupdate \; c \; r \; Ss)\;\; m \leq |
|
867 \hpa\; \; Ss \; m$. |
|
868 \end{center} |
|
869 This is also the case for $\nupdates$: |
|
870 \begin{center} |
|
871 $\hpa \;\; (\nupdates \; s \; r \; Ss)\;\; m \leq |
|
872 \hpa\; \; Ss \; m$. |
|
873 \end{center} |
|
874 Therefore we have that |
|
875 \begin{center} |
|
876 $\hpower \;\; (\nupdates \; s \; r \; Ss) \leq |
|
877 \hpower \;\; Ss$ |
|
878 \end{center} |
|
879 which leads to the lemma being proven. |
|
880 |
|
881 \end{proof} |
|
882 |
|
883 |
|
884 We also define the inductive rules for |
|
885 the shape of derivatives of the NTIMES regular expressions:\\[-3em] |
|
886 \begin{center} |
|
887 \begin{mathpar} |
|
888 \inferrule{\mbox{}}{\cbn \;\ZERO} |
|
889 |
|
890 \inferrule{\mbox{}}{\cbn \; \; r_a \cdot (r^{\{n\}})} |
|
891 |
|
892 \inferrule{\cbn \; r_1 \;\; \; \cbn \; r_2}{\cbn \; r_1 + r_2} |
|
893 |
|
894 \inferrule{\cbn \; r}{\cbn \; r + \ZERO} |
|
895 \end{mathpar} |
|
896 \end{center} |
|
897 \noindent |
|
898 A derivative of NTIMES fits into the shape described by $\cbn$: |
|
899 \begin{lemma}\label{ntimesDersCbn} |
|
900 $\cbn \; ((r' \cdot r^{\{n\}}) \backslash_{rs} s)$ holds. |
|
901 \end{lemma} |
|
902 \begin{proof} |
|
903 By a reverse induction on $s$. |
|
904 For the inductive case, note that if $\cbn \; r$ holds, |
|
905 then $\cbn \; (r\backslash_r c)$ holds. |
|
906 \end{proof} |
|
907 \noindent |
|
908 In addition, for $\cbn$-shaped regular expressioins, one can flatten |
|
909 them: |
|
910 \begin{lemma}\label{ntimesHfauPushin} |
|
911 If $\cbn \; r$ holds, then $\hflataux{r \backslash_r c} = |
|
912 \textit{concat} \; (\map \; \hflataux{\map \; (\_\backslash_r c) \; |
|
913 (\hflataux{r})})$ |
|
914 \end{lemma} |
|
915 \begin{proof} |
|
916 By an induction on the inductive cases of $\cbn$. |
|
917 \end{proof} |
|
918 \noindent |
|
919 This time we do not need to define the flattening functions for NTIMES only, |
|
920 because $\hflat{\_}$ and $\hflataux{\_}$ work on NTIMES already. |
|
921 \begin{lemma}\label{ntimesHfauInduct} |
|
922 $\hflataux{( (r\backslash_r c) \cdot r^{\{n\}}) \backslash_{rsimps} s} = |
|
923 \map \; (\opterm \; r) \; (\nupdates \; s \; r \; [\Some \; ([c], n)])$ |
|
924 \end{lemma} |
|
925 \begin{proof} |
|
926 By a reverse induction on $s$. |
|
927 The lemmas \ref{ntimesHfauPushin} and \ref{ntimesDersCbn} are used. |
|
928 \end{proof} |
|
929 \noindent |
|
930 We have a recursive property for NTIMES with $\nupdate$ |
|
931 similar to that for STAR, |
|
932 and one for $\nupdates $ as well: |
|
933 \begin{lemma}\label{nupdateInduct1} |
|
934 \mbox{} |
|
935 \begin{itemize} |
|
936 \item |
|
937 \begin{center} |
|
938 $\textit{concat} \; (\map \; (\hflataux{\_} \circ ( |
|
939 \opterm \; r)) \; Ss) = \map \; (\opterm \; r) \; (\nupdate \; |
|
940 c \; r \; Ss)$\\ |
|
941 \end{center} |
|
942 holds. |
|
943 \item |
|
944 \begin{center} |
|
945 $\textit{concat} \; (\map \; \hflataux{\_}\; |
|
946 \map \; (\_\backslash_r x) \; |
|
947 (\map \; (\opterm \; r) \; (\nupdates \; xs \; r \; Ss)))$\\ |
|
948 $=$\\ |
|
949 $\map \; (\opterm \; r) \; (\nupdates \;(xs@[x]) \; r\;Ss)$ |
|
950 \end{center} |
|
951 holds. |
|
952 \end{itemize} |
|
953 \end{lemma} |
|
954 \begin{proof} |
|
955 (i) is by an induction on $Ss$. |
|
956 (ii) is by an induction on $xs$. |
|
957 \end{proof} |
|
958 \noindent |
|
959 The $\nString$ predicate is defined for conveniently |
|
960 expressing that there are no empty strings in the |
|
961 $\Some \;(s, n)$ elements generated by $\nupdate$: |
|
962 \begin{center} |
|
963 \begin{tabular}{lcl} |
|
964 $\nString \; \None$ & $\dn$ & $ \textit{true}$\\ |
|
965 $\nString \; (\Some \; ([], n))$ & $\dn$ & $ \textit{false}$\\ |
|
966 $\nString \; (\Some \; (c::s, n))$ & $\dn$ & $ \textit{true}$\\ |
|
967 \end{tabular} |
|
968 \end{center} |
|
969 \begin{lemma}\label{nupdatesNonempty} |
|
970 If for all elements $o \in \textit{set} \; Ss$, |
|
971 $\nString \; o$ holds, the we have that |
|
972 for all elements $o' \in \textit{set} \; (\nupdates \; s \; r \; Ss)$, |
|
973 $\nString \; o'$ holds. |
|
974 \end{lemma} |
|
975 \begin{proof} |
|
976 By an induction on $s$, where $Ss$ is set to vary over all possible values. |
|
977 \end{proof} |
|
978 |
|
979 \noindent |
|
980 |
|
981 \begin{lemma}\label{ntimesClosedFormsSteps} |
|
982 The following list of equalities or rewriting relations hold:\\ |
|
983 (i) $r^{\{n+1\}} \backslash_{rsimps} (c::s) = |
|
984 \textit{rsimp} \; (\sum (\map \; (\opterm \;r \;\_) \; (\nupdates \; |
|
985 s \; r \; [\Some \; ([c], n)])))$\\ |
|
986 (ii) |
|
987 \begin{center} |
|
988 $\sum (\map \; (\opterm \; r) \; (\nupdates \; s \; r \; [ |
|
989 \Some \; ([c], n)]))$ \\ $ \sequal$\\ |
|
990 $\sum (\map \; (\textit{rsimp} \circ (\opterm \; r))\; (\nupdates \; |
|
991 s\;r \; [\Some \; ([c], n)]))$\\ |
|
992 \end{center} |
|
993 (iii) |
|
994 \begin{center} |
|
995 $\sum \;(\map \; (\optermosimp \; r) \; (\nupdates \; s \; r\; [\Some \; |
|
996 ([c], n)]))$\\ |
|
997 $\sequal$\\ |
|
998 $\sum \;(\map \; (\optermsimp r) \; (\nupdates \; s \; r \; [\Some \; |
|
999 ([c], n)])) $\\ |
|
1000 \end{center} |
|
1001 (iv) |
|
1002 \begin{center} |
|
1003 $\sum \;(\map \; (\optermosimp \; r) \; (\nupdates \; s \; r\; [\Some \; |
|
1004 ([c], n)])) $ \\ $\sequal$\\ |
|
1005 $\sum \;(\map \; (\optermOsimp r) \; (\nupdates \; s \; r \; [\Some \; |
|
1006 ([c], n)])) $\\ |
|
1007 \end{center} |
|
1008 (v) |
|
1009 \begin{center} |
|
1010 $\sum \;(\map \; (\optermOsimp r) \; (\nupdates \; s \; r \; [\Some \; |
|
1011 ([c], n)])) $ \\ $\sequal$\\ |
|
1012 $\sum \; (\map \; (\textit{rsimp} \circ (\opterm \; r)) \; |
|
1013 (\nupdates \; s \; r \; [\Some \; ([c], n)]))$ |
|
1014 \end{center} |
|
1015 \end{lemma} |
|
1016 \begin{proof} |
|
1017 Routine. |
|
1018 (iii) and (iv) make use of the fact that all the strings $s$ |
|
1019 inside $\Some \; (s, m)$ which are elements of the list |
|
1020 $\nupdates \; s\;r\;[\Some\; ([c], n)]$ are non-empty, |
|
1021 which is from lemma \ref{nupdatesNonempty}. |
|
1022 Once the string in $o = \Some \; (s, n)$ is |
|
1023 nonempty, $\optermsimp \; r \;o$, |
|
1024 $\optermosimp \; r \; o$ and $\optermosimp \; \; o$ are guaranteed |
|
1025 to be equal. |
|
1026 (v) uses \ref{nupdateInduct1}. |
|
1027 \end{proof} |
|
1028 \noindent |
|
1029 Now we are ready to present the closed form for NTIMES: |
|
1030 \begin{theorem}\label{ntimesClosedForm} |
|
1031 The derivative of $r^{\{n+1\}}$ can be described as an alternative |
|
1032 containing a list |
|
1033 of terms:\\ |
|
1034 $r^{\{n+1\}} \backslash_{rsimps} (c::s) = \textit{rsimp} \; ( |
|
1035 \sum (\map \; (\optermsimp \; r) \; (\nupdates \; s \; r \; |
|
1036 [\Some \; ([c], n)])))$ |
|
1037 \end{theorem} |
|
1038 \begin{proof} |
|
1039 By the rewriting steps described in lemma \ref{ntimesClosedFormsSteps}. |
|
1040 \end{proof} |
|
1041 \noindent |
|
1042 The key observation for bounding this closed form |
|
1043 is that the counter on $r^{\{n\}}$ will |
|
1044 only decrement during derivatives: |
|
1045 \begin{lemma}\label{nupdatesNLeqN} |
|
1046 For an element $o$ in $\textit{set} \; (\nupdates \; s \; r \; |
|
1047 [\Some \; ([c], n)])$, either $o = \None$, or $o = \Some |
|
1048 \; (s', m)$ for some string $s'$ and number $m \leq n$. |
|
1049 \end{lemma} |
|
1050 \noindent |
|
1051 The proof is routine and therefore omitted. |
|
1052 This allows us to say what kind of terms |
|
1053 are in the list $\textit{set} \; (\map \; (\optermsimp \; r) \; ( |
|
1054 \nupdates \; s \; r \; [\Some \; ([c], n)]))$: |
|
1055 only $\ZERO_r$s or a sequence with the tail an $r^{\{m\}}$ |
|
1056 with a small $m$: |
|
1057 \begin{lemma}\label{ntimesClosedFormListElemShape} |
|
1058 For any element $r'$ in $\textit{set} \; (\map \; (\optermsimp \; r) \; ( |
|
1059 \nupdates \; s \; r \; [\Some \; ([c], n)]))$, |
|
1060 we have that $r'$ is either $\ZERO$ or $r \backslash_{rsimps} s' \cdot |
|
1061 r^{\{m\}}$ for some string $s'$ and number $m \leq n$. |
|
1062 \end{lemma} |
|
1063 \begin{proof} |
|
1064 Using lemma \ref{nupdatesNLeqN}. |
|
1065 \end{proof} |
|
1066 |
|
1067 \begin{theorem}\label{ntimesClosedFormBounded} |
|
1068 Assuming that for any string $s$, $\llbracket r \backslash_{rsimps} s |
|
1069 \rrbracket_r \leq N$ holds, then we have that\\ |
|
1070 $\llbracket r^{\{n+1\}} \backslash_{rsimps} s \rrbracket_r \leq |
|
1071 \textit{max} \; (c_N+1)* (N + \llbracket r^{\{n\}} \rrbracket+1)$, |
|
1072 where $c_N = \textit{card} \; (\textit{sizeNregex} \; ( |
|
1073 N + \llbracket r^{\{n\}} \rrbracket_r+1))$. |
|
1074 \end{theorem} |
|
1075 \begin{proof} |
|
1076 We have that for all regular expressions $r'$ in $\textit{set} \; (\map \; (\optermsimp \; r) \; ( |
|
1077 \nupdates \; s \; r \; [\Some \; ([c], n)]))$, |
|
1078 $r'$'s size is less than or equal to $N + \llbracket r^{\{n\}} |
|
1079 \rrbracket_r + 1$ |
|
1080 because $r'$ can only be either a $\ZERO$ or $r \backslash_{rsimps} s' \cdot |
|
1081 r^{\{m\}}$ for some string $s'$ and number |
|
1082 $m \leq n$ (lemma \ref{ntimesClosedFormListElemShape}). |
|
1083 In addition, we know that the list |
|
1084 $\map \; (\optermsimp \; r) \; ( |
|
1085 \nupdates \; s \; r \; [\Some \; ([c], n)])$'s size is at most |
|
1086 $c_N = \textit{card} \; |
|
1087 (\sizeNregex \; ((N + \llbracket r^{\{n\}} \rrbracket) + 1))$. |
|
1088 This gives us $\llbracket r \backslash_{rsimps} \;s \rrbracket_r |
|
1089 \leq N * c_N$. |
|
1090 \end{proof} |
|
1091 |
|
1092 We aim to formalise the correctness and size bound |
|
1093 for constructs like $r^{\{\ldots n\}}$, $r^{\{n \ldots\}}$ |
|
1094 and so on, which is still work in progress. |
|
1095 They should more or less follow the same recipe described in this section. |
|
1096 Once we know about how to deal with them recursively using suitable auxiliary |
|
1097 definitions, we are able to routinely establish the proofs. |
|
1098 |
594 |
1099 |
595 |
1100 %The closed form for them looks like: |
596 %The closed form for them looks like: |
1101 %%\begin{center} |
597 %%\begin{center} |
1102 %% \begin{tabular}{llrclll} |
598 %% \begin{tabular}{llrclll} |