ChengsongTanPhdThesis/Chapters/Finite.tex
changeset 564 3cbcd7cda0a9
parent 562 57e33978e55d
child 576 3e1b699696b6
equal deleted inserted replaced
562:57e33978e55d 564:3cbcd7cda0a9
   207 We omit these functions, as they are routine. Please refer to the formalisation
   207 We omit these functions, as they are routine. Please refer to the formalisation
   208 (in file BasicIdentities.thy) for the exact definition.
   208 (in file BasicIdentities.thy) for the exact definition.
   209 With $\rrexp$ the size caclulation of annotated regular expressions'
   209 With $\rrexp$ the size caclulation of annotated regular expressions'
   210 simplification and derivatives can be done by the size of their unlifted 
   210 simplification and derivatives can be done by the size of their unlifted 
   211 counterpart with the unlifted version of simplification and derivatives applied.
   211 counterpart with the unlifted version of simplification and derivatives applied.
   212 \begin{lemma}
   212 \begin{lemma}\label{sizeRelations}
   213 	The following equalities hold:
   213 	The following equalities hold:
   214 	\begin{itemize}
   214 	\begin{itemize}
   215 		\item
   215 		\item
   216 			$\asize{\bsimp{a}} = \rsize{\rsimp{\rerase{a}}}$
   216 			$\asize{\bsimp{a}} = \rsize{\rsimp{\rerase{a}}}$
   217 		\item
   217 		\item
  1242     	
  1242     	
  1243 	\end{center}
  1243 	\end{center}
  1244 \end{corollary}
  1244 \end{corollary}
  1245 \noindent
  1245 \noindent
  1246 \subsection{Closed Forms for Star Regular Expressions}
  1246 \subsection{Closed Forms for Star Regular Expressions}
  1247 We use a similar technique as $r_1 \cdot r_2$ case,
  1247 We have shown how to control the size of the sequence regular expression $r_1\cdot r_2$ using
  1248 generating 
  1248 the "closed form" of $(r_1 \cdot r_2) \backslash s$ and then 
       
  1249 the property of the $\distinct$ function.
       
  1250 Now we try to get a bound on $r^* \backslash s$ as well.
       
  1251 Again, we first look at how a star's derivatives evolve, if they grow maximally: 
       
  1252 \begin{center}
       
  1253 
       
  1254 $r^* \quad \longrightarrow_{\backslash c}  \quad   (r\backslash c)  \cdot  r^* \quad \longrightarrow_{\backslash c'}  \quad
       
  1255 r \backslash cc'  \cdot r^* + r \backslash c' \cdot r^*  \quad \longrightarrow_{\backslash c''} \quad 
       
  1256 (r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*)   \quad \longrightarrow_{\backslash c'''}
       
  1257 \quad \ldots$
       
  1258 
       
  1259 \end{center}
       
  1260 When we have a string $s = c :: c' :: c'' \ldots$  such that $r \backslash c$, $r \backslash cc'$, $r \backslash c'$, 
       
  1261 $r \backslash cc'c''$, $r \backslash c'c''$, $r\backslash c''$ etc. are all nullable,
       
  1262 the number of terms in $r^* \backslash s$ will grow exponentially, causing the size
       
  1263 of the derivatives $r^* \backslash s$ to grow exponentially, even if we do not 
       
  1264 count the possible size explosions of $r \backslash c$ themselves.
       
  1265 
       
  1266 Thanks to $\rflts$ and $\rDistinct$, we are able to open up regexes like
       
  1267 $(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + 
       
  1268 (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $ 
       
  1269 into $\RALTS{[r_1 \backslash cc'c'' \cdot r^*, r \backslash c'', 
       
  1270 r \backslash c'c'' \cdot r^*, r \backslash c'' \cdot r^*]}$
       
  1271 and then de-duplicate terms of the form $r\backslash s' \cdot r^*$ ($s'$ being a substring of $s$).
       
  1272 This allows us to use a similar technique as $r_1 \cdot r_2$ case,
       
  1273 where the crux is to get an equivalent form of 
       
  1274 $\rderssimp{r^*}{s}$ with shape $\rsimp{\sum rs}$.
       
  1275 This requires generating 
  1249 all possible sub-strings $s'$ of $s$
  1276 all possible sub-strings $s'$ of $s$
  1250 such that $r\backslash s' \cdot r^*$ will appear 
  1277 such that $r\backslash s' \cdot r^*$ will appear 
  1251 as a term in $(r^*) \backslash s$.
  1278 as a term in $(r^*) \backslash s$.
  1252 The first function we define is a single-step
  1279 The first function we define is a single-step
  1253 updating function $\starupdate$, which takes three arguments as input:
  1280 updating function $\starupdate$, which takes three arguments as input:
  1302 		$\hflat{r}$ & $\dn$ & $r$
  1329 		$\hflat{r}$ & $\dn$ & $r$
  1303 	\end{tabular}
  1330 	\end{tabular}
  1304 \end{center}
  1331 \end{center}
  1305 \noindent
  1332 \noindent
  1306 %MAYBE TODO: introduce createdByStar
  1333 %MAYBE TODO: introduce createdByStar
       
  1334 Again these definitions are tailor-made for dealing with alternatives that have
       
  1335 originated from a star's derivatives, so we do not attempt to open up all possible 
       
  1336 regexes of the form $\RALTS{rs}$, where $\textit{rs}$ might not contain precisely 2
       
  1337 elements.
       
  1338 We give a predicate for such "star-created" regular expressions:
       
  1339 \begin{center}
       
  1340 \begin{tabular}{lcr}
       
  1341          &    &       $\createdByStar{(\RSEQ{ra}{\RSTAR{rb}}) }$\\
       
  1342  $\createdByStar{r_1} \land \createdByStar{r_2} $ & $ \Longrightarrow$ & $\createdByStar{(r_1 + r_2)}$
       
  1343  \end{tabular}
       
  1344  \end{center}
       
  1345  
       
  1346  These definitions allows us the flexibility to talk about 
       
  1347  regular expressions in their most convenient format,
       
  1348  for example, flattened out $\RALTS{[r_1, r_2, \ldots, r_n]} $
       
  1349  instead of binary-nested: $((r_1 + r_2) + (r_3 + r_4)) + \ldots$.
       
  1350  These definitions help express that certain classes of syntatically 
       
  1351  distinct regular expressions are actually the same under simplification.
       
  1352  This is not entirely true for annotated regular expressions: 
       
  1353  %TODO: bsimp bders \neq bderssimp
       
  1354  \begin{center}
       
  1355  $(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$
       
  1356  \end{center}
       
  1357  For bit-codes, the order in which simplification is applied
       
  1358  might cause a difference in the location they are placed.
       
  1359  If we want something like
       
  1360  \begin{center}
       
  1361  $\bderssimp{r}{s} \myequiv \bsimp{\bders{r}{s}}$
       
  1362  \end{center}
       
  1363  Some "canonicalization" procedure is required,
       
  1364  which either pushes all the common bitcodes to nodes
       
  1365   as senior as possible:
       
  1366   \begin{center}
       
  1367   $_{bs}(_{bs_1 @ bs'}r_1 + _{bs_1 @ bs''}r_2) \rightarrow _{bs @ bs_1}(_{bs'}r_1 + _{bs''}r_2) $
       
  1368   \end{center}
       
  1369  or does the reverse. However bitcodes are not of interest if we are talking about
       
  1370  the $\llbracket r \rrbracket$ size of a regex.
       
  1371  Therefore for the ease and simplicity of producing a
       
  1372  proof for a size bound, we are happy to restrict ourselves to 
       
  1373  unannotated regular expressions, and obtain such equalities as
       
  1374  \begin{lemma}
       
  1375  $\rsimp{r_1 + r_2} = \rsimp{\RALTS{\hflataux{r_1} @ \hflataux{r_2}}}$
       
  1376  \end{lemma}
       
  1377  
       
  1378  \begin{proof}
       
  1379  By using the rewriting relation $\rightsquigarrow$
       
  1380  \end{proof}
       
  1381  %TODO: rsimp sflat
       
  1382 And from this we obtain a proof that a star's derivative will be the same
       
  1383 as if it had all its nested alternatives created during deriving being flattened out:
       
  1384  For example,
       
  1385  \begin{lemma}
       
  1386  $\createdByStar{r} \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$
       
  1387  \end{lemma}
       
  1388  \begin{proof}
       
  1389  By structural induction on $r$, where the induction rules are these of $\createdByStar{_}$.
       
  1390  \end{proof}
       
  1391 % The simplification of a flattened out regular expression, provided it comes
       
  1392 %from the derivative of a star, is the same as the one nested.
       
  1393  
       
  1394 
       
  1395 
  1307 We first introduce an inductive property
  1396 We first introduce an inductive property
  1308 for $\starupdate$ and $\hflataux{\_}$, 
  1397 for $\starupdate$ and $\hflataux{\_}$, 
  1309 it says if we do derivatives of $r^*$
  1398 it says if we do derivatives of $r^*$
  1310 with a string that starts with $c$,
  1399 with a string that starts with $c$,
  1311 then flatten it out,
  1400 then flatten it out,
  1514 \end{center}
  1603 \end{center}
  1515 
  1604 
  1516 We reason similarly for  $\STAR$.
  1605 We reason similarly for  $\STAR$.
  1517 The inductive hypothesis is
  1606 The inductive hypothesis is
  1518 $\exists N. \forall s. \; \llbracket \rderssimp{r}{s} \rrbracket \leq N$.
  1607 $\exists N. \forall s. \; \llbracket \rderssimp{r}{s} \rrbracket \leq N$.
  1519 Let $\n_r = \llbracket r^* \rrbracket_r$.
  1608 Let $n_r = \llbracket r^* \rrbracket_r$.
  1520 When $s = c :: cs$ is not empty,
  1609 When $s = c :: cs$ is not empty,
  1521 \begin{center}
  1610 \begin{center}
  1522 \begin{tabular}{lcll}
  1611 \begin{tabular}{lcll}
  1523 & & $ \llbracket   \rderssimp{r^* }{c::cs} \rrbracket_r $\\
  1612 & & $ \llbracket   \rderssimp{r^* }{c::cs} \rrbracket_r $\\
  1524 & $ = $ & $\llbracket \rsimp{(\sum (\map \; (\lambda s. (r \backslash_{rsimp} s) \cdot r^*) \; (\starupdates\; 
  1613 & $ = $ & $\llbracket \rsimp{(\sum (\map \; (\lambda s. (r \backslash_{rsimp} s) \cdot r^*) \; (\starupdates\; 
  1570 \end{tabular}
  1659 \end{tabular}
  1571 \end{center}
  1660 \end{center}
  1572 (4), (8), and (12) are all the inductive cases proven.
  1661 (4), (8), and (12) are all the inductive cases proven.
  1573 \end{proof}
  1662 \end{proof}
  1574 
  1663 
       
  1664 
       
  1665 \begin{corollary}
       
  1666 For any regex $a$, $\exists N_r. \forall s. \; \rsize{\bderssimp{a}{s}} \leq N_r$
       
  1667 \end{corollary}
       
  1668 \begin{proof}
       
  1669 	By \ref{sizeRelations}.
       
  1670 \end{proof}
  1575 \noindent
  1671 \noindent
  1576 
  1672 
  1577 %-----------------------------------
  1673 %-----------------------------------
  1578 %	SECTION 2
  1674 %	SECTION 2
  1579 %-----------------------------------
  1675 %-----------------------------------
  1889 \end{center}
  1985 \end{center}
  1890 
  1986 
  1891 %----------------------------------------------------------------------------------------
  1987 %----------------------------------------------------------------------------------------
  1892 %	SECTION 4
  1988 %	SECTION 4
  1893 %----------------------------------------------------------------------------------------
  1989 %----------------------------------------------------------------------------------------
  1894 \section{A Bound for the Star Regular Expression}
       
  1895 We have shown how to control the size of the sequence regular expression $r_1\cdot r_2$ using
       
  1896 the "closed form" of $(r_1 \cdot r_2) \backslash s$ and then 
       
  1897 the property of the $\distinct$ function.
       
  1898 Now we try to get a bound on $r^* \backslash s$ as well.
       
  1899 Again, we first look at how a star's derivatives evolve, if they grow maximally: 
       
  1900 \begin{center}
       
  1901 
       
  1902 $r^* \quad \longrightarrow_{\backslash c}  \quad   (r\backslash c)  \cdot  r^* \quad \longrightarrow_{\backslash c'}  \quad
       
  1903 r \backslash cc'  \cdot r^* + r \backslash c' \cdot r^*  \quad \longrightarrow_{\backslash c''} \quad 
       
  1904 (r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*)   \quad \longrightarrow_{\backslash c'''}
       
  1905 \quad \ldots$
       
  1906 
       
  1907 \end{center}
       
  1908 When we have a string $s = c :: c' :: c'' \ldots$  such that $r \backslash c$, $r \backslash cc'$, $r \backslash c'$, 
       
  1909 $r \backslash cc'c''$, $r \backslash c'c''$, $r\backslash c''$ etc. are all nullable,
       
  1910 the number of terms in $r^* \backslash s$ will grow exponentially, causing the size
       
  1911 of the derivatives $r^* \backslash s$ to grow exponentially, even if we do not 
       
  1912 count the possible size explosions of $r \backslash c$ themselves.
       
  1913 
       
  1914 Thanks to $\flts$ and $\distinctWith$, we are able to open up regexes like
       
  1915 $(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $ 
       
  1916 into $\RALTS{[r_1 \backslash cc'c'' \cdot r^*, r \backslash c'', r \backslash c'c'' \cdot r^*, r \backslash c'' \cdot r^*]}$
       
  1917 and then de-duplicate terms of the form $r\backslash s' \cdot r^*$ ($s'$ being a substring of $s$).
       
  1918 For this we define $\hflataux{\_}$ and $\hflat{\_}$, similar to $\sflataux{\_}$ and $\sflat{\_}$:
       
  1919 %TODO: definitions of  and \hflataux \hflat
       
  1920  \begin{center}  
       
  1921  \begin{tabular}{ccc}
       
  1922  $\hflataux{r_1 + r_2}$ & $=$ & $\hflataux{r_1} @ \hflataux{r_2}$\\
       
  1923 $\hflataux r$ & $=$ & $ [r]$
       
  1924 \end{tabular}
       
  1925 \end{center}
       
  1926 
       
  1927  \begin{center}  
       
  1928  \begin{tabular}{ccc}
       
  1929  $\hflat{r_1 + r_2}$ & $=$ & $\RALTS{\hflataux{r_1} @ \hflataux{r_2}}$\\
       
  1930 $\hflat r$ & $=$ & $ r$
       
  1931 \end{tabular}
       
  1932 \end{center}s
       
  1933 Again these definitions are tailor-made for dealing with alternatives that have
       
  1934 originated from a star's derivatives, so we don't attempt to open up all possible 
       
  1935 regexes of the form $\RALTS{rs}$, where $\textit{rs}$ might not contain precisely 2
       
  1936 elements.
       
  1937 We give a predicate for such "star-created" regular expressions:
       
  1938 \begin{center}
       
  1939 \begin{tabular}{lcr}
       
  1940          &    &       $\createdByStar{(\RSEQ{ra}{\RSTAR{rb}}) }$\\
       
  1941  $\createdByStar{r_1} \land \createdByStar{r_2} $ & $ \Longrightarrow$ & $\createdByStar{(r_1 + r_2)}$
       
  1942  \end{tabular}
       
  1943  \end{center}
       
  1944  
       
  1945  These definitions allows us the flexibility to talk about 
       
  1946  regular expressions in their most convenient format,
       
  1947  for example, flattened out $\RALTS{[r_1, r_2, \ldots, r_n]} $
       
  1948  instead of binary-nested: $((r_1 + r_2) + (r_3 + r_4)) + \ldots$.
       
  1949  These definitions help express that certain classes of syntatically 
       
  1950  distinct regular expressions are actually the same under simplification.
       
  1951  This is not entirely true for annotated regular expressions: 
       
  1952  %TODO: bsimp bders \neq bderssimp
       
  1953  \begin{center}
       
  1954  $(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$
       
  1955  \end{center}
       
  1956  For bit-codes, the order in which simplification is applied
       
  1957  might cause a difference in the location they are placed.
       
  1958  If we want something like
       
  1959  \begin{center}
       
  1960  $\bderssimp{r}{s} \myequiv \bsimp{\bders{r}{s}}$
       
  1961  \end{center}
       
  1962  Some "canonicalization" procedure is required,
       
  1963  which either pushes all the common bitcodes to nodes
       
  1964   as senior as possible:
       
  1965   \begin{center}
       
  1966   $_{bs}(_{bs_1 @ bs'}r_1 + _{bs_1 @ bs''}r_2) \rightarrow _{bs @ bs_1}(_{bs'}r_1 + _{bs''}r_2) $
       
  1967   \end{center}
       
  1968  or does the reverse. However bitcodes are not of interest if we are talking about
       
  1969  the $\llbracket r \rrbracket$ size of a regex.
       
  1970  Therefore for the ease and simplicity of producing a
       
  1971  proof for a size bound, we are happy to restrict ourselves to 
       
  1972  unannotated regular expressions, and obtain such equalities as
       
  1973  \begin{lemma}
       
  1974  $\rsimp{r_1 + r_2} = \rsimp{\RALTS{\hflataux{r_1} @ \hflataux{r_2}}}$
       
  1975  \end{lemma}
       
  1976  
       
  1977  \begin{proof}
       
  1978  By using the rewriting relation $\rightsquigarrow$
       
  1979  \end{proof}
       
  1980  %TODO: rsimp sflat
       
  1981 And from this we obtain a proof that a star's derivative will be the same
       
  1982 as if it had all its nested alternatives created during deriving being flattened out:
       
  1983  For example,
       
  1984  \begin{lemma}
       
  1985  $\createdByStar{r} \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$
       
  1986  \end{lemma}
       
  1987  \begin{proof}
       
  1988  By structural induction on $r$, where the induction rules are these of $\createdByStar{_}$.
       
  1989  \end{proof}
       
  1990 % The simplification of a flattened out regular expression, provided it comes
       
  1991 %from the derivative of a star, is the same as the one nested.
       
  1992  
       
  1993  
  1990  
  1994  
  1991  
  1995  
  1992  
  1996  
  1993  
  1997  
  1994