836 |
849 |
837 Lists of one-step rewrite rules for flattening and de-duplicating |
850 Lists of one-step rewrite rules for flattening and de-duplicating |
838 a list of regular expressions ($\grewrite$): |
851 a list of regular expressions ($\grewrite$): |
839 \begin{center} |
852 \begin{center} |
840 \begin{mathpar} |
853 \begin{mathpar} |
841 \inferrule{}{\RZERO :: rs \frewrite rs \\} |
854 \inferrule{}{\RZERO :: rs \grewrite rs \\} |
842 |
855 |
843 \inferrule{}{(\sum rs) :: rs_a \frewrite rs @ rs_a \\} |
856 \inferrule{}{(\sum rs) :: rs_a \grewrite rs @ rs_a \\} |
844 |
857 |
845 \inferrule{rs_1 \frewrite rs_2}{r :: rs_1 \frewrite r :: rs_2} |
858 \inferrule{rs_1 \grewrite rs_2}{r :: rs_1 \grewrite r :: rs_2} |
846 |
859 |
847 \inferrule[dB]{}{rs_a @ [a] @ rs_b @[a] @ rs_c \grewrite rs_a @ [a] @ rsb @ rsc} |
860 \inferrule[dB]{}{rs_a @ [a] @ rs_b @[a] @ rs_c \grewrite rs_a @ [a] @ rsb @ rsc} |
848 \end{mathpar} |
861 \end{mathpar} |
849 \end{center} |
862 \end{center} |
850 |
863 |
851 \noindent |
864 \noindent |
852 The reason why we take the trouble of defining |
865 The reason why we take the trouble of defining |
853 two separate list rewriting definitions $\frewrite$ and $\grewrite$ |
866 two separate list rewriting definitions $\frewrite$ and $\grewrite$ |
854 is that sometimes $\grewrites$ is slightly too powerful |
867 is to separate the two stages of simplification: flattening and de-duplicating. |
855 for proving equivalence. |
868 Sometimes $\grewrites$ is slightly too powerful |
|
869 so we would rather use $\frewrites$ which makes certain rewriting steps |
|
870 more straightforward to prove. |
856 For example, when proving the closed-form for the alternative regular expression, |
871 For example, when proving the closed-form for the alternative regular expression, |
857 one of the rewriting steps would be: |
872 one of the rewriting steps would be: |
858 \begin{lemma} |
873 \begin{lemma} |
859 $\sum (\rDistinct \; (\map \; (\_ \backslash x) \; (\rflts \; rs)) \; \varnothing) \sequal |
874 $\sum (\rDistinct \;\; (\map \; (\_ \backslash x) \; (\rflts \; rs)) \;\; \varnothing) \sequal |
860 \sum (\rDistinct \; (\rflts \; (\map \; (\_ \backslash x) \; rs)) \; \varnothing) |
875 \sum (\rDistinct \;\; (\rflts \; (\map \; (\_ \backslash x) \; rs)) \;\; \varnothing) |
861 $ |
876 $ |
862 \end{lemma} |
877 \end{lemma} |
863 \noindent |
878 \noindent |
864 Proving this is by first showing |
879 Proving this is by first showing |
865 \begin{lemma} |
880 \begin{lemma}\label{earlyLaterDerFrewrites} |
866 $\map \; (\_ \backslash x) \; (\rflts \; rs) \frewrites |
881 $\map \; (\_ \backslash x) \; (\rflts \; rs) \frewrites |
867 \rflts \; (\map \; (\_ \backslash x) \; rs)$ |
882 \rflts \; (\map \; (\_ \backslash x) \; rs)$ |
868 \end{lemma} |
883 \end{lemma} |
869 \noindent |
884 \noindent |
870 and then using lemma |
885 and then using lemma |
871 \begin{lemma}\label{frewritesSimpeq} |
886 \begin{lemma}\label{frewritesSimpeq} |
872 If $rs_1 \frewrites rs_2 $, then $\sum (\rDistinct \; rs_1 \; \varnothing) \sequal |
887 If $rs_1 \frewrites rs_2 $, then $\sum (\rDistinct \; rs_1 \; \varnothing) \sequal |
873 \sum (\rDistinct \; rs_2 \; {})$. |
888 \sum (\rDistinct \; rs_2 \; \varnothing)$. |
874 \end{lemma} |
889 \end{lemma} |
875 . But this trick will not work for $\grewrites$: |
890 \noindent |
|
891 is a piece of cake. |
|
892 But this trick will not work for $\grewrites$. |
|
893 For example, a rewriting step in proving |
|
894 closed forms is: |
|
895 \begin{center} |
|
896 $\rsimp{(\rsimpalts \; (\map \; (\_ \backslash x) \; (\rdistinct{(\rflts \; (\map \; (\rsimp{} \; \circ \; (\lambda r. \rderssimp{r}{xs}))))}{\varnothing})))}$\\ |
|
897 $=$ \\ |
|
898 $\rsimp{(\rsimpalts \; (\rdistinct{(\map \; (\_ \backslash x) \; (\rflts \; (\map \; (\rsimp{} \; \circ \; (\lambda r. \rderssimp{r}{xs})))) ) }{\varnothing}))} $ |
|
899 \noindent |
|
900 \end{center} |
|
901 For this one would hope to have a rewriting relation between the two lists involved, |
|
902 similar to \ref{earlyLaterDerFrewrites}. However, it turns out that |
876 \begin{center} |
903 \begin{center} |
877 $\map \; (\_ \backslash x) \; (\rDistinct \; rs \; rset) \grewrites \rDistinct \; (\map \; |
904 $\map \; (\_ \backslash x) \; (\rDistinct \; rs \; rset) \grewrites \rDistinct \; (\map \; |
878 (\_ \backslash x) \; rs) \; (\map \; (\_ \backslash x) \; rset)$ |
905 (\_ \backslash x) \; rs) \; ( rset \backslash x)$ |
879 \end{center} |
906 \end{center} |
880 \noindent |
907 \noindent |
881 does \emph{not} hold. |
908 does $\mathbf{not}$ hold in general. |
882 |
909 For this rewriting step we will introduce some slightly more cumbersome |
883 |
910 proof technique in later sections. |
884 |
911 The point is that $\frewrite$ |
885 |
912 allows us to prove equivalence in a straightforward two-step method that is |
886 %this is for closed form for alts section, talk about that later------------------------------ |
913 not possible for $\grewrite$, thereby reducing the complexity of the entire proof. |
887 \begin{comment} |
914 |
888 \begin{center} |
915 |
889 $\rsimp{(\rsimpalts \; (\map \; (\_ \backslash x) \; (\rdistinct{(\rflts \; (\map \; (\rsimp{} \; \circ \; (\lambda r. \rderssimp{r}{xs}))))}{\varnothing})))} = |
916 \subsubsection{Terms That Can Be Rewritten Using $\hrewrites$, $\grewrites$, and $\frewrites$} |
890 \rsimp{(\rsimpalts \; (\rdistinct{(\map \; (\_ \backslash x) \; (\rflts \; (\map \; (\rsimp{} \; \circ \; (\lambda r. \rderssimp{r}{xs})))) ) }{\varnothing}))} $ |
917 We present in the below lemma a few pairs of terms that are rewritable via |
891 \end{center} |
918 $\grewrites$: |
|
919 \begin{lemma}\label{gstarRdistinctGeneral} |
|
920 \begin{itemize} |
|
921 \item |
|
922 $rs_1 @ rs \grewrites rs_1 @ (\rDistinct \; rs \; rs_1)$ |
|
923 \item |
|
924 $rs \grewrites \rDistinct \; rs \; \varnothing$ |
|
925 \item |
|
926 $rs_a @ (\rDistinct \; rs \; rs_a) \grewrites rs_a @ (\rDistinct \; |
|
927 rs \; (\{\RZERO\} \cup rs_a))$ |
|
928 \item |
|
929 $rs \;\; @ \;\; \rDistinct \; rs_a \; rset \grewrites rs @ \rDistinct \; rs_a \; |
|
930 (rest \cup rs)$ |
|
931 |
|
932 \end{itemize} |
|
933 \end{lemma} |
|
934 \noindent |
|
935 If a pair of terms $rs_1, rs_2$ are rewritable via $\grewrites$ to each other, |
|
936 then they are equivalent under $\rsimp{}$: |
|
937 \begin{lemma}\label{grewritesSimpalts} |
|
938 If $rs_1 \grewrites rs_2$, then |
|
939 we have the following equivalence hold: |
|
940 \begin{itemize} |
|
941 \item |
|
942 $\sum rs_1 \sequal \sum rs_2$ |
|
943 \item |
|
944 $\rsimpalts \; rs_1 \sequal \rsimpalts \; rs_2$ |
|
945 \end{itemize} |
|
946 \end{lemma} |
|
947 \noindent |
|
948 Here are a few connecting lemmas showing that |
|
949 if a list of regular expressions can be rewritten using $\grewrites$ or $\frewrites $ or |
|
950 $\scfrewrites$, |
|
951 then an alternative constructor taking the list can also be rewritten using $\hrewrites$: |
|
952 \begin{lemma} |
|
953 \begin{itemize} |
|
954 \item |
|
955 If $rs \grewrites rs'$ then $\sum rs \hrewrites \sum rs'$. |
|
956 \item |
|
957 If $rs \grewrites rs'$ then $\sum rs \hrewrites \rsimpalts \; rs'$ |
|
958 \item |
|
959 If $rs_1 \scfrewrites rs_2$ then $\sum (rs @ rs_1) \hrewrites \sum (rs @ rs_2)$ |
|
960 \item |
|
961 If $rs_1 \scfrewrites rs_2$ then $\sum rs_1 \hrewrites \sum rs_2$ |
|
962 |
|
963 \end{itemize} |
|
964 \end{lemma} |
|
965 \noindent |
|
966 Here comes the meat of the proof, |
|
967 which says that once two lists are rewritable to each other, |
|
968 then they are equivalent under $\rsimp{}$: |
|
969 \begin{lemma} |
|
970 If $r_1 \hrewrites r_2$ then $r_1 \sequal r_2$. |
|
971 \end{lemma} |
|
972 |
|
973 \noindent |
|
974 And similar to \ref{Bitcoded2} one can preserve rewritability after taking derivative |
|
975 of two regular expressions on both sides: |
|
976 \begin{lemma}\label{interleave} |
|
977 If $r \hrewrites r' $ then $\rder{c}{r} \hrewrites \rder{c}{r'}$ |
|
978 \end{lemma} |
|
979 \noindent |
|
980 This allows proving more $\mathbf{rsimp}$-equivalent terms, involving $\backslash_r$ now. |
|
981 \begin{lemma}\label{insideSimpRemoval} |
|
982 $\rsimp{\rder{c}{\rsimp{r}}} = \rsimp{\rder{c}{r}} $ |
|
983 \end{lemma} |
|
984 \noindent |
|
985 \begin{proof} |
|
986 By \ref{interleave} and \ref{rsimpIdem}. |
|
987 \end{proof} |
|
988 \noindent |
|
989 And this unlocks more equivalent terms: |
|
990 \begin{lemma}\label{Simpders} |
|
991 As corollaries of \ref{insideSimpRemoval}, we have |
|
992 \begin{itemize} |
|
993 \item |
|
994 If $s \neq []$ then $\rderssimp{r}{s} = \rsimp{(\rders \; r \; s)}$. |
|
995 \item |
|
996 $\rsimpalts \; (\map \; (\_ \backslash_r x) \; |
|
997 (\rdistinct{rs}{\varnothing})) \sequal |
|
998 \rsimpalts \; (\rDistinct \; |
|
999 (\map \; (\_ \backslash_r x) rs) \;\varnothing )$ |
|
1000 \end{itemize} |
|
1001 \end{lemma} |
|
1002 \noindent |
|
1003 |
|
1004 Finally, |
|
1005 together with |
|
1006 \begin{lemma}\label{rderRsimpAltsCommute} |
|
1007 $\rder{x}{(\rsimpalts \; rs)} = \rsimpalts \; (\map \; (\rder{x}{\_}) \; rs)$ |
|
1008 \end{lemma} |
|
1009 \noindent |
|
1010 this leads to the first closed form-- |
|
1011 \begin{lemma}\label{altsClosedForm} |
|
1012 \begin{center} |
|
1013 $\rderssimp{(\sum rs)}{s} \sequal |
|
1014 \sum \; (\map \; (\rderssimp{\_}{s}) \; rs)$ |
|
1015 \end{center} |
|
1016 \end{lemma} |
892 |
1017 |
893 \noindent |
1018 \noindent |
894 This is not possible to prove |
1019 \begin{proof} |
895 |
1020 By a reverse induction on the string $s$. |
896 \end{comment} |
1021 One rewriting step, as we mentioned earlier, |
897 %this is for closed form for alts section, talk about that later------------------------------ |
1022 involves |
898 And we define an "grewrite" relation that works on lists: |
1023 \begin{center} |
|
1024 $\rsimpalts \; (\map \; (\_ \backslash x) \; |
|
1025 (\rdistinct{(\rflts \; (\map \; (\rsimp{} \; \circ \; |
|
1026 (\lambda r. \rderssimp{r}{xs}))))}{\varnothing})) |
|
1027 \sequal |
|
1028 \rsimpalts \; (\rdistinct{(\map \; (\_ \backslash x) \; |
|
1029 (\rflts \; (\map \; (\rsimp{} \; \circ \; |
|
1030 (\lambda r. \rderssimp{r}{xs})))) ) }{\varnothing}) $. |
|
1031 \end{center} |
|
1032 This can be proven by a combination of |
|
1033 \ref{grewritesSimpalts}, \ref{gstarRdistinctGeneral}, \ref{rderRsimpAltsCommute}, and |
|
1034 \ref{insideSimpRemoval}. |
|
1035 \end{proof} |
|
1036 \noindent |
|
1037 This closed form has a variant which can be more convenient in later proofs: |
|
1038 \begin{corollary} |
|
1039 If $s \neq []$ then |
|
1040 $\rderssimp \; (\sum \; rs) \; s = |
|
1041 \rsimp{(\sum \; (\map \; \rderssimp{\_}{s} \; rs))}$. |
|
1042 \end{corollary} |
|
1043 \noindent |
|
1044 The harder closed forms are the sequence and star ones. |
|
1045 Before we go on to obtain them, some preliminary definitions |
|
1046 are needed to make proof statements concise. |
|
1047 |
|
1048 \section{"Closed Forms" of regular expressions' derivatives w.r.t strings} |
|
1049 We note that the different ways in which regular expressions are |
|
1050 nested do not matter under $\rsimp{}$: |
|
1051 \begin{center} |
|
1052 To be proven:\\ |
|
1053 $\rsimp{r} \stackrel{?}{=} \rsimp{r'}$ if $r = \sum [r_1, r_2, r_3, \ldots]$ |
|
1054 and $r' =(\ldots ((r_1 + r_2) + r_3) + \ldots)$ |
|
1055 \end{center} |
|
1056 \noindent |
|
1057 This intuition is also echoed by IndianPaper, where they gave |
|
1058 a pencil-and-paper derivation of $(r_1 \cdot r_2)\backslash s$: |
|
1059 \begin{center} |
|
1060 $((r_1 \cdot r_2) \backslash_r c_1) \stackrel{\backslash_r c_2}{=} |
|
1061 ((r_1 \backslash_r c_1) \cdot r_2 + (\delta\; (\rnullable \; r_1),\; r_2 \backslash_r c_1)) |
|
1062 \stackrel{\backslash_r c_2}{=} |
|
1063 ((r_1 \backslash_r c_1c_2 \cdot r_2 + (\delta \; (\rnullable) \; r_1, r_2 \backslash_r c_1c_2) |
|
1064 + (\delta \; (\rnullable \; r_1 \backslash_r c)\; r_2 \backslash_r c_2) |
|
1065 $ |
|
1066 \end{center} |
|
1067 \noindent |
|
1068 The $\delta \; b \; r$ function above turns the entire term into $\RZERO$ |
|
1069 if the boolean condition $b$ evaluates to false, and outputs $r$ otherwise. |
|
1070 The equality in their derivation steps should be interpretated |
|
1071 as language equivalence. To pin down the intuition into rigorous terms, |
|
1072 $\sflat{}$ is used to enable the transformation from |
|
1073 a left-associative nested sequence of alternatives into |
|
1074 a flattened list: |
|
1075 $r = \sum [r_1, r_2, r_3, \ldots] \stackrel{\sflat{}}{\rightarrow} |
|
1076 (\ldots ((r_1 + r_2) + r_3) + \ldots)$ |
|
1077 The definitions $\sflat{}$, $\sflataux{}$ are given below. |
|
1078 |
|
1079 \begin{center} |
|
1080 \begin{tabular}{ccc} |
|
1081 $\sflataux{\AALTS{ }{r :: rs}}$ & $=$ & $\sflataux{r} @ rs$\\ |
|
1082 $\sflataux{\AALTS{ }{[]}}$ & $ = $ & $ []$\\ |
|
1083 $\sflataux r$ & $=$ & $ [r]$ |
|
1084 \end{tabular} |
|
1085 \end{center} |
|
1086 |
|
1087 \begin{center} |
|
1088 \begin{tabular}{ccc} |
|
1089 $\sflat{(\sum r :: rs)}$ & $=$ & $\sum (\sflataux{r} @ rs)$\\ |
|
1090 $\sflat{\sum []}$ & $ = $ & $ \sum []$\\ |
|
1091 $\sflat r$ & $=$ & $ r$ |
|
1092 \end{tabular} |
|
1093 \end{center} |
|
1094 The intuition behind $\sflataux{\_}$ is to break up nested regexes |
|
1095 of the $(\ldots((r_1 + r_2) + r_3) + \ldots )$(left-associated) shape |
|
1096 into a more "balanced" list: $\AALTS{\_}{[r_1,\, r_2 ,\, r_3, \ldots]}$. |
|
1097 It will return the singleton list $[r]$ otherwise. |
|
1098 $\sflat{\_}$ works the same as $\sflataux{\_}$, except that it keeps |
|
1099 the output type a regular expression, not a list. |
|
1100 $\sflataux{\_}$ and $\sflat{\_}$ is only recursive in terms of the |
|
1101 first element of the list of children of |
|
1102 an alternative regex ($\AALTS{ }{rs}$), and is purposefully built for dealing with the regular expression |
|
1103 $r_1 \cdot r_2 \backslash s$. |
|
1104 \subsection{The $\textit{vsuf}$ Function} |
|
1105 Note that $((r_1 \backslash_r c_1c_2 \cdot r_2 + (\delta \; (\rnullable) \; r_1, r_2 \backslash_r c_1c_2) |
|
1106 + (\delta \; (\rnullable \; r_1 \backslash_r c)\; r_2 \backslash_r c_2)$ does not faithfully |
|
1107 represent what would the intermediate derivatives would actually look like. |
|
1108 For example, when $r_1$ and $r_1 \backslash_r c_1$ are nullable, |
|
1109 the regular expression would not look like $(r_1 \backslash_r c_1c_2 + \RZERO ) + \RZERO$, |
|
1110 but actually $r_1\backslash_r c_1c_2$, the redundant $\RZERO$s will not be created in the |
|
1111 first place. |
|
1112 In a closed-form one would want to take into account this |
|
1113 and generate the list of string pairs $(s', s'')$ where $s'@s'' = s$ such that |
|
1114 only $r_1 \backslash s'$ nullable. |
|
1115 With $\sflat{\_}$ and $\sflataux{\_}$ we make |
|
1116 precise what "closed forms" we have for the sequence derivatives and their simplifications, |
|
1117 in other words, how can we construct $(r_1 \cdot r_2) \backslash s$ |
|
1118 and $\bderssimp{(r_1\cdot r_2)}{s}$, |
|
1119 if we are only allowed to use a combination of $r_1 \backslash s'$ ($\bderssimp{r_1}{s'}$) |
|
1120 and $r_2 \backslash s'$ ($\bderssimp{r_2}{s'}$), where $s'$ ranges over |
|
1121 the substring of $s$? |
|
1122 First let's look at a series of derivatives steps on a sequence |
|
1123 regular expression, (assuming) that each time the first |
|
1124 component of the sequence is always nullable): |
|
1125 \begin{center} |
|
1126 |
|
1127 $r_1 \cdot r_2 \quad \longrightarrow_{\backslash c} \quad r_1 \backslash c \cdot r_2 + r_2 \backslash c \quad \longrightarrow_{\backslash c'} \quad (r_1 \backslash cc' \cdot r_2 + r_2 \backslash c') + r_2 \backslash cc' \longrightarrow_{\backslash c''} \quad$\\ |
|
1128 $((r_1 \backslash cc'c'' \cdot r_2 + r_2 \backslash c'') + r_2 \backslash c'c'') + r_2 \backslash cc'c'' \longrightarrow_{\backslash c''} \quad |
|
1129 \ldots$ |
|
1130 |
|
1131 \end{center} |
|
1132 %TODO: cite indian paper |
|
1133 Indianpaper have come up with a slightly more formal way of putting the above process: |
|
1134 \begin{center} |
|
1135 $r_1 \cdot r_2 \backslash (c_1 :: c_2 ::\ldots c_n) \myequiv r_1 \backslash (c_1 :: c_2:: \ldots c_n) + |
|
1136 \delta(\nullable(r_1 \backslash (c_1 :: c_2 \ldots c_{n-1}) ), r_2 \backslash (c_n)) + \ldots |
|
1137 + \delta (\nullable(r_1), r_2\backslash (c_1 :: c_2 ::\ldots c_n))$ |
|
1138 \end{center} |
|
1139 where $\delta(b, r)$ will produce $r$ |
|
1140 if $b$ evaluates to true, |
|
1141 and $\ZERO$ otherwise. |
|
1142 |
|
1143 But the $\myequiv$ sign in the above equation means language equivalence instead of syntactical |
|
1144 equivalence. To make this intuition useful |
|
1145 for a formal proof, we need something |
|
1146 stronger than language equivalence. |
|
1147 With the help of $\sflat{\_}$ we can state the equation in Indianpaper |
|
1148 more rigorously: |
|
1149 \begin{lemma} |
|
1150 $\sflat{(r_1 \cdot r_2) \backslash s} = \RALTS{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}$ |
|
1151 \end{lemma} |
|
1152 |
|
1153 The function $\vsuf{\_}{\_}$ is defined recursively on the structure of the string: |
|
1154 |
899 \begin{center} |
1155 \begin{center} |
900 \begin{tabular}{lcl} |
1156 \begin{tabular}{lcl} |
901 $ \ZERO :: rs$ & $\rightarrow_g$ & $rs$ |
1157 $\vsuf{[]}{\_} $ & $=$ & $[]$\\ |
902 \end{tabular} |
1158 $\vsuf{c::cs}{r_1}$ & $ =$ & $ \textit{if} (\rnullable{r_1}) \textit{then} \; (\vsuf{cs}{(\rder{c}{r_1})}) @ [c :: cs]$\\ |
903 \end{center} |
1159 && $\textit{else} \; (\vsuf{cs}{(\rder{c}{r_1}) }) $ |
904 |
1160 \end{tabular} |
905 |
1161 \end{center} |
906 |
1162 It takes into account which prefixes $s'$ of $s$ would make $r \backslash s'$ nullable, |
907 With these relations we prove |
1163 and keep a list of suffixes $s''$ such that $s' @ s'' = s$. The list is sorted in |
|
1164 the order $r_2\backslash s''$ appears in $(r_1\cdot r_2)\backslash s$. |
|
1165 In essence, $\vsuf{\_}{\_}$ is doing a "virtual derivative" of $r_1 \cdot r_2$, but instead of producing |
|
1166 the entire result of $(r_1 \cdot r_2) \backslash s$, |
|
1167 it only stores all the $s''$ such that $r_2 \backslash s''$ are occurring terms in $(r_1\cdot r_2)\backslash s$. |
|
1168 |
|
1169 With this we can also add simplifications to both sides and get |
908 \begin{lemma} |
1170 \begin{lemma} |
909 $rs \rightarrow rs' \implies \RALTS{rs} \rightarrow \RALTS{rs'}$ |
1171 $\rsimp{\sflat{(r_1 \cdot r_2) \backslash s} }= \rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}$ |
910 \end{lemma} |
1172 \end{lemma} |
911 which enables us to prove "closed forms" equalities such as |
1173 Together with the idempotency property of $\rsimp{}$\ref{rsimpIdem}, |
|
1174 %TODO: state the idempotency property of rsimp |
|
1175 it is possible to convert the above lemma to obtain a "closed form" |
|
1176 for derivatives nested with simplification: |
912 \begin{lemma} |
1177 \begin{lemma} |
913 $\sflat{(r_1 \cdot r_2) \backslash s} = \RALTS{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\suffix \; s \; r_1 ))}$ |
1178 $\rderssimp{(r_1 \cdot r_2)}{s} = \rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}$ |
914 \end{lemma} |
1179 \end{lemma} |
915 |
1180 And now the reason we have (1) in section 1 is clear: |
916 But the most involved part of the above lemma is proving the following: |
1181 $\rsize{\rsimp{\RALTS{ (r_1 \backslash s) \cdot r_2 :: (\map \;(r_2 \backslash \_) \; (\vsuf{s}{r_1}))}}}$, |
917 \begin{lemma} |
1182 is equal to $\rsize{\rsimp{\RALTS{ ((r_1 \backslash s) \cdot r_2 :: (\map \; (r_2 \backslash \_) \; (\textit{Suffix}(r1, s))))}}}$ |
918 $\bsimp{\RALTS{rs @ \RALTS{rs_1} @ rs'}} = \bsimp{\RALTS{rs @rs_1 @ rs'}} $ |
1183 as $\vsuf{s}{r_1}$ is one way of expressing the list $\textit{Suffix}( r_1, s)$. |
919 \end{lemma} |
1184 |
920 which is needed in proving things like |
1185 %---------------------------------------------------------------------------------------- |
921 \begin{lemma} |
1186 % SECTION 3 |
922 $r \rightarrow_f r' \implies \rsimp{r} \rightarrow \rsimp{r'}$ |
1187 %---------------------------------------------------------------------------------------- |
923 \end{lemma} |
1188 |
924 |
1189 \section{interaction between $\distinctWith$ and $\flts$} |
925 Fortunately this is provable by induction on the list $rs$ |
1190 Note that it is not immediately obvious that |
|
1191 \begin{center} |
|
1192 $\llbracket \distinctWith (\flts \textit{rs}) = \phi \rrbracket \leq \llbracket \distinctWith \textit{rs} = \phi \rrbracket $. |
|
1193 \end{center} |
|
1194 The intuition is that if we remove duplicates from the $\textit{LHS}$, at least the same amount of |
|
1195 duplicates will be removed from the list $\textit{rs}$ in the $\textit{RHS}$. |
|
1196 |
926 |
1197 |
927 \subsection{A Closed Form for the Sequence Regular Expression} |
1198 \subsection{A Closed Form for the Sequence Regular Expression} |
928 \begin{quote}\it |
1199 \begin{quote}\it |
929 Claim: For regular expressions $r_1 \cdot r_2$, we claim that |
1200 Claim: For regular expressions $r_1 \cdot r_2$, we claim that |
930 \begin{center} |
1201 \begin{center} |
1220 |
1491 |
1221 |
1492 |
1222 %---------------------------------------------------------------------------------------- |
1493 %---------------------------------------------------------------------------------------- |
1223 % SECTION ?? |
1494 % SECTION ?? |
1224 %---------------------------------------------------------------------------------------- |
1495 %---------------------------------------------------------------------------------------- |
1225 |
|
1226 \section{"Closed Forms" of regular expressions' derivatives w.r.t strings} |
|
1227 To embark on getting the "closed forms" of regexes, we first |
|
1228 define a few auxiliary definitions to make expressing them smoothly. |
|
1229 |
|
1230 \begin{center} |
|
1231 \begin{tabular}{ccc} |
|
1232 $\sflataux{\AALTS{ }{r :: rs}}$ & $=$ & $\sflataux{r} @ rs$\\ |
|
1233 $\sflataux{\AALTS{ }{[]}}$ & $ = $ & $ []$\\ |
|
1234 $\sflataux r$ & $=$ & $ [r]$ |
|
1235 \end{tabular} |
|
1236 \end{center} |
|
1237 The intuition behind $\sflataux{\_}$ is to break up nested regexes |
|
1238 of the $(\ldots((r_1 + r_2) + r_3) + \ldots )$(left-associated) shape |
|
1239 into a more "balanced" list: $\AALTS{\_}{[r_1,\, r_2 ,\, r_3, \ldots]}$. |
|
1240 It will return the singleton list $[r]$ otherwise. |
|
1241 |
|
1242 $\sflat{\_}$ works the same as $\sflataux{\_}$, except that it keeps |
|
1243 the output type a regular expression, not a list: |
|
1244 \begin{center} |
|
1245 \begin{tabular}{ccc} |
|
1246 $\sflat{\AALTS{ }{r :: rs}}$ & $=$ & $\sflataux{r} @ rs$\\ |
|
1247 $\sflat{\AALTS{ }{[]}}$ & $ = $ & $ \AALTS{ }{[]}$\\ |
|
1248 $\sflat r$ & $=$ & $ [r]$ |
|
1249 \end{tabular} |
|
1250 \end{center} |
|
1251 $\sflataux{\_}$ and $\sflat{\_}$ is only recursive in terms of the |
|
1252 first element of the list of children of |
|
1253 an alternative regex ($\AALTS{ }{rs}$), and is purposefully built for dealing with the regular expression |
|
1254 $r_1 \cdot r_2 \backslash s$. |
|
1255 |
|
1256 With $\sflat{\_}$ and $\sflataux{\_}$ we make |
|
1257 precise what "closed forms" we have for the sequence derivatives and their simplifications, |
|
1258 in other words, how can we construct $(r_1 \cdot r_2) \backslash s$ |
|
1259 and $\bderssimp{(r_1\cdot r_2)}{s}$, |
|
1260 if we are only allowed to use a combination of $r_1 \backslash s'$ ($\bderssimp{r_1}{s'}$) |
|
1261 and $r_2 \backslash s'$ ($\bderssimp{r_2}{s'}$), where $s'$ ranges over |
|
1262 the substring of $s$? |
|
1263 First let's look at a series of derivatives steps on a sequence |
|
1264 regular expression, (assuming) that each time the first |
|
1265 component of the sequence is always nullable): |
|
1266 \begin{center} |
|
1267 |
|
1268 $r_1 \cdot r_2 \quad \longrightarrow_{\backslash c} \quad r_1 \backslash c \cdot r_2 + r_2 \backslash c \quad \longrightarrow_{\backslash c'} \quad (r_1 \backslash cc' \cdot r_2 + r_2 \backslash c') + r_2 \backslash cc' \longrightarrow_{\backslash c''} \quad$\\ |
|
1269 $((r_1 \backslash cc'c'' \cdot r_2 + r_2 \backslash c'') + r_2 \backslash c'c'') + r_2 \backslash cc'c'' \longrightarrow_{\backslash c''} \quad |
|
1270 \ldots$ |
|
1271 |
|
1272 \end{center} |
|
1273 %TODO: cite indian paper |
|
1274 Indianpaper have come up with a slightly more formal way of putting the above process: |
|
1275 \begin{center} |
|
1276 $r_1 \cdot r_2 \backslash (c_1 :: c_2 ::\ldots c_n) \myequiv r_1 \backslash (c_1 :: c_2:: \ldots c_n) + |
|
1277 \delta(\nullable(r_1 \backslash (c_1 :: c_2 \ldots c_{n-1}) ), r_2 \backslash (c_n)) + \ldots |
|
1278 + \delta (\nullable(r_1), r_2\backslash (c_1 :: c_2 ::\ldots c_n))$ |
|
1279 \end{center} |
|
1280 where $\delta(b, r)$ will produce $r$ |
|
1281 if $b$ evaluates to true, |
|
1282 and $\ZERO$ otherwise. |
|
1283 |
|
1284 But the $\myequiv$ sign in the above equation means language equivalence instead of syntactical |
|
1285 equivalence. To make this intuition useful |
|
1286 for a formal proof, we need something |
|
1287 stronger than language equivalence. |
|
1288 With the help of $\sflat{\_}$ we can state the equation in Indianpaper |
|
1289 more rigorously: |
|
1290 \begin{lemma} |
|
1291 $\sflat{(r_1 \cdot r_2) \backslash s} = \RALTS{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}$ |
|
1292 \end{lemma} |
|
1293 |
|
1294 The function $\vsuf{\_}{\_}$ is defined recursively on the structure of the string: |
|
1295 |
|
1296 \begin{center} |
|
1297 \begin{tabular}{lcl} |
|
1298 $\vsuf{[]}{\_} $ & $=$ & $[]$\\ |
|
1299 $\vsuf{c::cs}{r_1}$ & $ =$ & $ \textit{if} (\rnullable{r_1}) \textit{then} \; (\vsuf{cs}{(\rder{c}{r_1})}) @ [c :: cs]$\\ |
|
1300 && $\textit{else} \; (\vsuf{cs}{(\rder{c}{r_1}) }) $ |
|
1301 \end{tabular} |
|
1302 \end{center} |
|
1303 It takes into account which prefixes $s'$ of $s$ would make $r \backslash s'$ nullable, |
|
1304 and keep a list of suffixes $s''$ such that $s' @ s'' = s$. The list is sorted in |
|
1305 the order $r_2\backslash s''$ appears in $(r_1\cdot r_2)\backslash s$. |
|
1306 In essence, $\vsuf{\_}{\_}$ is doing a "virtual derivative" of $r_1 \cdot r_2$, but instead of producing |
|
1307 the entire result of $(r_1 \cdot r_2) \backslash s$, |
|
1308 it only stores all the $s''$ such that $r_2 \backslash s''$ are occurring terms in $(r_1\cdot r_2)\backslash s$. |
|
1309 |
|
1310 With this we can also add simplifications to both sides and get |
|
1311 \begin{lemma} |
|
1312 $\rsimp{\sflat{(r_1 \cdot r_2) \backslash s} }= \rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}$ |
|
1313 \end{lemma} |
|
1314 Together with the idempotency property of $\rsimp{}$\ref{rsimpIdem}, |
|
1315 %TODO: state the idempotency property of rsimp |
|
1316 it is possible to convert the above lemma to obtain a "closed form" |
|
1317 for derivatives nested with simplification: |
|
1318 \begin{lemma} |
|
1319 $\rderssimp{(r_1 \cdot r_2)}{s} = \rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}$ |
|
1320 \end{lemma} |
|
1321 And now the reason we have (1) in section 1 is clear: |
|
1322 $\rsize{\rsimp{\RALTS{ (r_1 \backslash s) \cdot r_2 :: (\map \;(r_2 \backslash \_) \; (\vsuf{s}{r_1}))}}}$, |
|
1323 is equal to $\rsize{\rsimp{\RALTS{ ((r_1 \backslash s) \cdot r_2 :: (\map \; (r_2 \backslash \_) \; (\textit{Suffix}(r1, s))))}}}$ |
|
1324 as $\vsuf{s}{r_1}$ is one way of expressing the list $\textit{Suffix}( r_1, s)$. |
|
1325 |
|
1326 %---------------------------------------------------------------------------------------- |
|
1327 % SECTION 3 |
|
1328 %---------------------------------------------------------------------------------------- |
|
1329 |
|
1330 \section{interaction between $\distinctWith$ and $\flts$} |
|
1331 Note that it is not immediately obvious that |
|
1332 \begin{center} |
|
1333 $\llbracket \distinctWith (\flts \textit{rs}) = \phi \rrbracket \leq \llbracket \distinctWith \textit{rs} = \phi \rrbracket $. |
|
1334 \end{center} |
|
1335 The intuition is that if we remove duplicates from the $\textit{LHS}$, at least the same amount of |
|
1336 duplicates will be removed from the list $\textit{rs}$ in the $\textit{RHS}$. |
|
1337 |
|
1338 |
1496 |
1339 %----------------------------------- |
1497 %----------------------------------- |
1340 % SECTION syntactic equivalence under simp |
1498 % SECTION syntactic equivalence under simp |
1341 %----------------------------------- |
1499 %----------------------------------- |
1342 \section{Syntactic Equivalence Under $\simp$} |
1500 \section{Syntactic Equivalence Under $\simp$} |