ChengsongTanPhdThesis/Chapters/Cubic.tex
changeset 620 ae6010c14e49
parent 613 b0f0d884a547
child 621 17c7611fb0a9
equal deleted inserted replaced
619:2072a8d54e3e 620:ae6010c14e49
    40 We update our formalisation of 
    40 We update our formalisation of 
    41 the correctness and finiteness properties to
    41 the correctness and finiteness properties to
    42 include this new construct. With bounded repetitions
    42 include this new construct. With bounded repetitions
    43 we are able to out-compete other verified lexers such as
    43 we are able to out-compete other verified lexers such as
    44 Verbatim++ on regular expressions which involve a lot of
    44 Verbatim++ on regular expressions which involve a lot of
    45 repetitions. We also present the idempotency property proof
    45 repetitions. %We also present the idempotency property proof
    46 of $\bsimp$, which leverages the idempotency proof of $\rsimp$.
    46 %of $\bsimp$, which leverages the idempotency proof of $\rsimp$.
    47 This reinforces our claim that the fixpoint construction
    47 %This reinforces our claim that the fixpoint construction
    48 originally required by Sulzmann and Lu can be removed in $\blexersimp$.
    48 %originally required by Sulzmann and Lu can be removed in $\blexersimp$.
    49 \\
    49 \\
    50 Last but not least, we present our efforts and challenges we met
    50 Last but not least, we present our efforts and challenges we met
    51 in further improving the algorithm by data
    51 in further improving the algorithm by data
    52 structures such as zippers.
    52 structures such as zippers.
    53 
    53 
   539 add one more line which can be solved by sledgehammer 
   539 add one more line which can be solved by sledgehammer 
   540 to solve the $r^{\{n\}}$ inductive case.
   540 to solve the $r^{\{n\}}$ inductive case.
   541 
   541 
   542 
   542 
   543 \subsubsection{Finiteness Proof Augmentation}
   543 \subsubsection{Finiteness Proof Augmentation}
   544 The bounded repetitions can be handled
   544 The bounded repetitions are
   545 using techniques similar to stars.
   545 very similar to stars, and therefore the treatment
       
   546 is similar, with minor changes to handle some slight complications
       
   547 when the counter reaches 0.
       
   548 The exponential growth is similar:
       
   549 \begin{center}
       
   550 	\begin{tabular}{ll}
       
   551 		$r^{\{n\}} $ & $\longrightarrow_{\backslash c}$\\
       
   552 		$(r\backslash c)  \cdot  
       
   553 		r^{\{n - 1\}}*$ & $\longrightarrow_{\backslash c'}$\\
       
   554 		\\
       
   555 		$r \backslash cc'  \cdot r^{\{n - 2\}}* + 
       
   556 		r \backslash c' \cdot r^{\{n - 1\}}*$ &
       
   557 		$\longrightarrow_{\backslash c''}$\\
       
   558 		\\
       
   559 		$(r_1 \backslash cc'c'' \cdot r^{\{n-3\}}* + 
       
   560 		r \backslash c''\cdot r^{\{n-1\}}) + 
       
   561 		(r \backslash c'c'' \cdot r^{\{n-2\}}* + 
       
   562 		r \backslash c'' \cdot r^{\{n-1\}}*)$ & 
       
   563 		$\longrightarrow_{\backslash c'''}$ \\
       
   564 		\\
       
   565 		$\ldots$\\
       
   566 	\end{tabular}
       
   567 \end{center}
       
   568 Again, we assume that $r\backslash c$, $r \backslash cc'$ and so on
       
   569 are all nullable.
       
   570 The flattened list of terms for $r^{\{n\}} \backslash_{rs} s$
       
   571 \begin{center}
       
   572 	$[r_1 \backslash cc'c'' \cdot r^{\{n-3\}}*,\;
       
   573 	r \backslash c''\cdot r^{\{n-1\}}, \; 
       
   574 	r \backslash c'c'' \cdot r^{\{n-2\}}*, \;
       
   575 	r \backslash c'' \cdot r^{\{n-1\}}*,\; \ldots ]$  
       
   576 \end{center}
       
   577 that comes from 
       
   578 \begin{center}
       
   579 		$(r_1 \backslash cc'c'' \cdot r^{\{n-3\}}* + 
       
   580 		r \backslash c''\cdot r^{\{n-1\}}) + 
       
   581 		(r \backslash c'c'' \cdot r^{\{n-2\}}* + 
       
   582 		r \backslash c'' \cdot r^{\{n-1\}}*)+ \ldots$ 
       
   583 \end{center}
       
   584 are made of sequences with different tails, where the counters
       
   585 might differ.
       
   586 The observation for maintaining the bound is that
       
   587 these counters never exceed $n$, the original
       
   588 counter. With the number of counters staying finite,
       
   589 $\rDistinct$ will deduplicate and keep the list finite.
       
   590 We introduce this idea as a lemma once we describe all
       
   591 the necessary helper functions.
       
   592 
       
   593 Similar to the star case, we want
       
   594 \begin{center}
       
   595 	$\rderssimp{r^{\{n\}}}{s} = \rsimp{\sum rs}$.
       
   596 \end{center}
       
   597 where $rs$
       
   598 shall be in the form of 
       
   599 $\map \; f \; Ss$, where $f$ is a function and
       
   600 $Ss$ a list of objects to act on.
       
   601 For star, the object's datatype is string.
       
   602 The list of strings $Ss$
       
   603 is generated using functions 
       
   604 $\starupdate$ and $\starupdates$.
       
   605 The function that takes a string and returns a regular expression
       
   606 is the anonymous function $
       
   607 (\lambda s'. \; r\backslash s' \cdot r^{\{m\}})$.
       
   608 In the NTIMES setting,
       
   609 the $\starupdate$ and $\starupdates$ functions are replaced by 
       
   610 $\textit{nupdate}$ and $\textit{nupdates}$:
       
   611 \begin{center}
       
   612 	\begin{tabular}{lcl}
       
   613 		$\nupdate \; c \; r \; [] $ & $\dn$ & $[]$\\
       
   614 		$\nupdate \; c \; r \; 
       
   615 		(\Some \; (s, \; n + 1) \; :: \; Ss)$ & $\dn$ & %\\
       
   616 						     $\textit{if} \; 
       
   617 						     (\rnullable \; (r \backslash_{rs} s))$ \\
       
   618 						     & & $\;\;\textit{then} 
       
   619 						     \;\; \Some \; (s @ [c], n + 1) :: \Some \; ([c], n) :: (
       
   620 						     \nupdate \; c \; r \; Ss)$ \\
       
   621 						     & & $\textit{else} \;\; \Some \; (s @ [c], n+1) :: (
       
   622 						     \nupdate \; c \; r \; Ss)$\\
       
   623 		$\nupdate \; c \; r \; (\textit{None} :: Ss)$ & $\dn$ & 
       
   624 		$(\None :: \nupdate  \; c \; r \; Ss)$\\
       
   625 							      & & \\
       
   626 	%\end{tabular}
       
   627 %\end{center}
       
   628 %\begin{center}
       
   629 	%\begin{tabular}{lcl}
       
   630 		$\nupdates \; [] \; r \; Ss$ & $\dn$ & $Ss$\\
       
   631 		$\nupdates \; (c :: cs) \; r \; Ss$ &  $\dn$ &  $\nupdates \; cs \; r \; (
       
   632 		\nupdate \; c \; r \; Ss)$
       
   633 	\end{tabular}
       
   634 \end{center}
       
   635 \noindent
       
   636 which take into account when a subterm
       
   637 \begin{center}
       
   638 	$r \backslash_s s \cdot r^{\{n\}}$
       
   639 \end{center}
       
   640 counter $n$
       
   641 is 0, and therefore expands to 
       
   642 \begin{center}
       
   643 $r \backslash_s (s@[c]) \cdot r^{\{n\}} \;+
       
   644 \; \ZERO$ 
       
   645 \end{center}
       
   646 after taking a derivative.
       
   647 The object now has type 
       
   648 \begin{center}
       
   649 $\textit{option} \;(\textit{string}, \textit{nat})$
       
   650 \end{center}
       
   651 and therefore the function for converting such an option into
       
   652 a regular expression term is called $\opterm$:
       
   653 
       
   654 \begin{center}
       
   655 	\begin{tabular}{lcl}
       
   656 	$\opterm \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\
       
   657 				 & & $\;\;\Some \; (s, n) \Rightarrow 
       
   658 				 (r\backslash_{rs} s)\cdot r^{\{n\}}$\\
       
   659 				 & & $\;\;\None  \Rightarrow 
       
   660 				 \ZERO$\\
       
   661 	\end{tabular}
       
   662 \end{center}
       
   663 \noindent
       
   664 Put together, the list $\map \; f \; Ss$ is instantiated as
       
   665 \begin{center}
       
   666 	$\map \; (\opterm \; r) \; (\nupdates \; s \; r \;
       
   667 	[\Some \; ([c], n)])$.
       
   668 \end{center}
       
   669 For the closed form to be bounded, we would like
       
   670 simplification to be applied to each term in the list.
       
   671 Therefore we introduce some variants of $\opterm$,
       
   672 which help conveniently express the rewriting steps 
       
   673 needed in the closed form proof.
       
   674 \begin{center}
       
   675 	\begin{tabular}{lcl}
       
   676 	$\optermOsimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\
       
   677 				 & & $\;\;\Some \; (s, n) \Rightarrow 
       
   678 				 \textit{rsimp} \; ((r\backslash_{rs} s)\cdot r^{\{n\}})$\\
       
   679 				 & & $\;\;\None  \Rightarrow 
       
   680 				 \ZERO$\\
       
   681 				 \\
       
   682 	$\optermosimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\
       
   683 				 & & $\;\;\Some \; (s, n) \Rightarrow 
       
   684 				 (\textit{rsimp} \; (r\backslash_{rs} s)) 
       
   685 				 \cdot r^{\{n\}}$\\
       
   686 				 & & $\;\;\None  \Rightarrow 
       
   687 				 \ZERO$\\
       
   688 				 \\
       
   689 	$\optermsimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\
       
   690 				 & & $\;\;\Some \; (s, n) \Rightarrow 
       
   691 				 (r\backslash_{rsimps} s)\cdot r^{\{n\}}$\\
       
   692 				 & & $\;\;\None  \Rightarrow 
       
   693 				 \ZERO$\\
       
   694 	\end{tabular}
       
   695 \end{center}
       
   696 
       
   697 
       
   698 For a list of 
       
   699 $\textit{option} \;(\textit{string}, \textit{nat})$ elements,
       
   700 we define the highest power for it recursively:
       
   701 \begin{center}
       
   702 	\begin{tabular}{lcl}
       
   703 		$\hpa \; [] \; n $ & $\dn$ & $n$\\
       
   704 		$\hpa \; (\None :: os) \; n $ &  $\dn$ &  $\hpa \; os \; n$\\
       
   705 		$\hpa \; (\Some \; (s, n) :: os) \; m$ & $\dn$ & 
       
   706 		$\hpa \;os \; (\textit{max} \; n\; m)$\\
       
   707 		\\
       
   708 		$\hpower \; rs $ & $\dn$ & $\hpa \; rs \; 0$\\
       
   709 	\end{tabular}
       
   710 \end{center}
       
   711 
       
   712 Now the intuition that an NTIMES regular expression's power
       
   713 does not increase can be easily expressed as
       
   714 \begin{lemma}\label{nupdatesMono2}
       
   715 	$\hpower \; (\nupdates \;s \; r \; [\Some \; ([c], n)]) \leq n$
       
   716 \end{lemma}
       
   717 \begin{proof}
       
   718 	Note that the power is non-increasing after a $\nupdate$ application:
       
   719 	\begin{center}
       
   720 		$\hpa \;\; (\nupdate \; c \; r \; Ss)\;\; m \leq 
       
   721 		 \hpa\; \; Ss \; m$.
       
   722 	 \end{center}
       
   723 	 This is also the case for $\nupdates$:
       
   724 	\begin{center}
       
   725 		$\hpa \;\; (\nupdates \; s \; r \; Ss)\;\; m \leq 
       
   726 		 \hpa\; \; Ss \; m$.
       
   727 	 \end{center}
       
   728 	 Therefore we have that
       
   729 	 \begin{center}
       
   730 		 $\hpower \;\; (\nupdates \; s \; r \; Ss) \leq
       
   731 		  \hpower \;\; Ss$
       
   732 	 \end{center}
       
   733 	 which leads to the lemma being proven.
       
   734 
       
   735  \end{proof}
       
   736 
       
   737 
       
   738 We also define the inductive rules for
       
   739 the shape of derivatives of the NTIMES regular expressions:\\[-3em]
       
   740 \begin{center}
       
   741 	\begin{mathpar}
       
   742 		\inferrule{\mbox{}}{\cbn \;\ZERO}
       
   743 
       
   744 		\inferrule{\mbox{}}{\cbn \; \; r_a \cdot (r^{\{n\}})}
       
   745 
       
   746 		\inferrule{\cbn \; r_1 \;\; \; \cbn \; r_2}{\cbn \; r_1 + r_2}
       
   747 
       
   748 		\inferrule{\cbn \; r}{\cbn \; r + \ZERO}
       
   749 	\end{mathpar}
       
   750 \end{center}
       
   751 \noindent
       
   752 A derivative of NTIMES fits into the shape described by $\cbn$:
       
   753 \begin{lemma}\label{ntimesDersCbn}
       
   754 	$\cbn \; ((r' \cdot r^{\{n\}}) \backslash_{rs} s)$ holds.
       
   755 \end{lemma}
       
   756 \begin{proof}
       
   757 	By a reverse induction on $s$.
       
   758 	For the inductive case, note that if $\cbn \; r$ holds,
       
   759 	then $\cbn \; (r\backslash_r c)$ holds.
       
   760 \end{proof}
       
   761 In addition, for $\cbn$-shaped regular expressioins, one can flatten
       
   762 them:
       
   763 \begin{lemma}\label{ntimesHfauPushin}
       
   764 	If $\cbn \; r$ holds, then $\hflataux{r \backslash_r c} = 
       
   765 	\textit{concat} \; (\map \; \hflataux{\map \; (\_\backslash_r c) \;
       
   766 	(\hflataux{r})})$
       
   767 \end{lemma}
       
   768 \begin{proof}
       
   769 	By an induction on the inductive cases of $\cbn$.
       
   770 \end{proof}
       
   771 		
       
   772 
       
   773 
       
   774 This time we do not need to define the flattening functions for NTIMES only,
       
   775 because $\hflat{\_}$ and $\hflataux{\_}$ work on NTIMES already:
       
   776 \begin{lemma}\label{ntimesHfauInduct}
       
   777 $\hflataux{( (r\backslash_r c) \cdot r^{\{n\}}) \backslash_{rsimps} s} = 
       
   778  \map \; (\opterm \; r) \; (\nupdates \; s \; r \; [\Some \; ([c], n)])$
       
   779 \end{lemma}
       
   780 \begin{proof}
       
   781 	By a reverse induction on $s$.
       
   782 	The lemma \ref{ntimesDersCbn} is used.
       
   783 \end{proof}
       
   784 \noindent
       
   785 We have a recursive property for NTIMES with $\nupdate$ 
       
   786 similar to that for STAR,
       
   787 and one for $\nupdates $ as well:
       
   788 \begin{lemma}\label{nupdateInduct1}
       
   789 	(i) $\textit{concat} \; (\map \; (\hflataux{\_} \circ (
       
   790 	\opterm \; r)) \; Ss) = \map \; (\opterm \; r) \; (\nupdate \;
       
   791 	c \; r \; Ss)$\\
       
   792 	(ii) $\textit{concat} \; (\map \; \hflataux{\_}\; 
       
   793 	\map \; (\_\backslash_r x) \;
       
   794 		(\map \; (\opterm \; r) \; (\nupdates \; xs \; r \; Ss))) =
       
   795 	\map \; (\opterm \; r) \; (\nupdates \;(xs@[x]) \; r\;Ss)$ holds.
       
   796 \end{lemma}
       
   797 \begin{proof}
       
   798 	(i) is by an induction on $Ss$.
       
   799 	(ii) is by an induction on $xs$.
       
   800 \end{proof}
       
   801 
       
   802 The $\nString$ predicate is defined for conveniently
       
   803 expressing that there are no empty strings in the
       
   804 $\Some \;(s, n)$ elements generated by $\nupdate$:
       
   805 \begin{center}
       
   806 	\begin{tabular}{lcl}
       
   807 		$\nString \; \None$  & $\dn$ & $ \textit{true}$\\
       
   808 		$\nString \; (\Some \; ([], n))$ & $\dn$ & $ \textit{false}$\\
       
   809 		$\nString \; (\Some \; (c::s, n))$  & $\dn$ & $ \textit{true}$\\
       
   810 	\end{tabular}
       
   811 \end{center}
       
   812 \begin{lemma}\label{nupdatesNonempty}
       
   813 	If for all elements $opt$ in $\textit{set} \; Ss$,
       
   814 	$\nString \; opt$ holds, the we have that
       
   815 	for all elements $opt$ in $\textit{set} \; (\nupdates \; s \; r \; Ss)$,
       
   816 	$\nString \; opt$ holds.
       
   817 \end{lemma}
       
   818 \begin{proof}
       
   819 	By an induction on $s$, where $Ss$ is set to vary over all possible values.
       
   820 \end{proof}
       
   821 
       
   822 \noindent
       
   823 
       
   824 \begin{lemma}\label{ntimesClosedFormsSteps}
       
   825 	The following list of equalities or rewriting relations hold:\\
       
   826 	(i) $r^{\{n+1\}} \backslash_{rsimps} (c::s) = 
       
   827 	\textit{rsimp} \; (\sum (\map \; (\opterm \;r \;\_) \; (\nupdates \;
       
   828 	s \; r \; [\Some \; ([c], n)])))$\\
       
   829 	(ii)
       
   830 	\begin{center}
       
   831 	$\sum (\map \; (\opterm \; r) \; (\nupdates \; s \; r \; [
       
   832 	\Some \; ([c], n)]))$ \\ $ \sequal$\\
       
   833 	 $\sum (\map \; (\textit{rsimp} \circ (\opterm \; r))\; (\nupdates \;
       
   834 	 s\;r \; [\Some \; ([c], n)]))$\\
       
   835  	\end{center}
       
   836 	(iii)
       
   837 	\begin{center}
       
   838 	$\sum \;(\map \; (\optermosimp \; r) \; (\nupdates \; s \; r\; [\Some \;
       
   839 	([c], n)]))$\\ 
       
   840 	$\sequal$\\
       
   841 	 $\sum \;(\map \; (\optermsimp r) \; (\nupdates \; s \; r \; [\Some \;
       
   842 	([c], n)])) $\\
       
   843 	\end{center}
       
   844 	(iv)
       
   845 	\begin{center}
       
   846 	$\sum \;(\map \; (\optermosimp \; r) \; (\nupdates \; s \; r\; [\Some \;
       
   847 	([c], n)])) $ \\ $\sequal$\\
       
   848 	 $\sum \;(\map \; (\optermOsimp r) \; (\nupdates \; s \; r \; [\Some \;
       
   849 	([c], n)])) $\\
       
   850 	\end{center}
       
   851 	(v)
       
   852 	\begin{center}
       
   853 	 $\sum \;(\map \; (\optermOsimp r) \; (\nupdates \; s \; r \; [\Some \;
       
   854 	 ([c], n)])) $ \\ $\sequal$\\
       
   855 	  $\sum \; (\map \; (\textit{rsimp} \circ (\opterm \; r)) \;
       
   856 	  (\nupdates \; s \; r \; [\Some \; ([c], n)]))$
       
   857   	\end{center}
       
   858 \end{lemma}
       
   859 \begin{proof}
       
   860 	Routine.
       
   861 	(iii) and (iv) make use of the fact that all the strings $s$
       
   862 	inside $\Some \; (s, m)$ which are elements of the list
       
   863 	$\nupdates \; s\;r\;[\Some\; ([c], n)]$ are non-empty,
       
   864 	which is from lemma \ref{nupdatesNonempty}.
       
   865 	Once the string in $o = \Some \; (s, n)$ is 
       
   866 	nonempty, $\optermsimp \; r \;o$,
       
   867 	$\optermosimp \; r \; o$ and $\optermosimp \; \; o$ are guaranteed
       
   868 	to be equal.
       
   869 \end{proof}
       
   870 
       
   871 \begin{theorem}\label{ntimesClosedForm}
       
   872 	The derivative of $r^{\{n+1\}}$ can be described as an alternative
       
   873 	containing a list
       
   874 	of terms:\\
       
   875 	$r^{\{n+1\}} \backslash_{rsimps} (c::s) = \textit{rsimp} \; (
       
   876 	\sum (\map \; (\optermsimp \; r) \; (\nupdates \; s \; r \;
       
   877 	[\Some \; ([c], n)])))$
       
   878 \end{theorem}
       
   879 \begin{proof}
       
   880 	By the rewriting steps described in lemma \ref{ntimesClosedFormsSteps}.
       
   881 \end{proof}
       
   882 
       
   883 \begin{lemma}\label{nupdatesNLeqN}
       
   884 	For an element $o$ in $\textit{set} \; (\nupdates \; s \; r \;
       
   885 	[\Some \; ([c], n)])$, either $o = \None$, or $o = \Some
       
   886 	\; (s', m)$ for some string $s'$ and number $m \leq n$.
       
   887 \end{lemma}
       
   888 
       
   889 
       
   890 \begin{lemma}\label{ntimesClosedFormListElemShape}
       
   891 	For any element $r'$ in $\textit{set} \; (\map \; (\optermsimp \; r) \; (
       
   892 	\nupdates \; s \; r \; [\Some \; ([c], n)]))$,
       
   893 	we have that $r'$ is either $\ZERO$ or $r \backslash_{rsimps} s' \cdot
       
   894 	r^{\{m\}}$ for some string $s'$ and number $m \leq n$.
       
   895 \end{lemma}
       
   896 \begin{proof}
       
   897 	Using lemma \ref{nupdatesNLeqN}.
       
   898 \end{proof}
       
   899 
       
   900 \begin{theorem}\label{ntimesClosedFormBounded}
       
   901 	Assuming that for any string $s$, $\llbracket r \backslash_{rsimps} s
       
   902 	\rrbracket_r \leq N$ holds, then we have that\\
       
   903 	$\llbracket r^{\{n+1\}} \backslash_{rsimps} s \rrbracket_r \leq
       
   904 	\textit{max} \; (c_N+1)* (N + \llbracket r^{\{n\}} \rrbracket+1)$,
       
   905 	where $c_N = \textit{card} \; (\textit{sizeNregex} \; (
       
   906 	N + \llbracket r^{\{n\}} \rrbracket_r+1))$.
       
   907 \end{theorem}
       
   908 \begin{proof}
       
   909 We have that for all regular expressions $r'$ in $\textit{set} \; (\map \; (\optermsimp \; r) \; (
       
   910 	\nupdates \; s \; r \; [\Some \; ([c], n)]))$,
       
   911 	$r'$'s size is less than or equal to $N + \llbracket r^{\{n\}} 
       
   912 	\rrbracket_r + 1$
       
   913 because $r'$ can only be either a $\ZERO$ or $r \backslash_{rsimps} s' \cdot
       
   914 r^{\{m\}}$ for some string $s'$ and number 
       
   915 $m \leq n$ (lemma \ref{ntimesClosedFormListElemShape}).
       
   916 In addition, we know that the list 
       
   917 $\map \; (\optermsimp \; r) \; (
       
   918 \nupdates \; s \; r \; [\Some \; ([c], n)])$'s size is at most
       
   919 $c_N = \textit{card} \; 
       
   920 (\sizeNregex \; ((N + \llbracket r^{\{n\}} \rrbracket) + 1))$.
       
   921 This gives us $\llbracket \llbracket r \backslash_{rsimps} \;s \llbracket_r
       
   922 \leq N * c_N$.
       
   923 \end{proof}
       
   924 
       
   925 
       
   926 Assuming we have that
       
   927 \begin{center}
       
   928 	$\rderssimp{r^*}{s} = \rsimp{(\sum \map \; (\lambda s'. r\backslash s' \cdot r^*) \; Ss)}$.
       
   929 \end{center}
       
   930 holds.
       
   931 The idea of $\starupdate$ and $\starupdates$
       
   932 is to update $Ss$ when another
       
   933 derivative is taken on $\rderssimp{r^*}{s}$
       
   934 w.r.t a character $c$ and a string $s'$
       
   935 respectively.
       
   936 Both $\starupdate$ and $\starupdates$ take three arguments as input:
       
   937 the new character $c$ or string $s$ to take derivative with, 
       
   938 the regular expression
       
   939 $r$ under the star $r^*$, and the
       
   940 list of strings $Ss$ for the derivative $r^* \backslash s$ 
       
   941 up until this point  
       
   942 such that 
       
   943 \begin{center}
       
   944 $(r^*) \backslash s = \sum_{s' \in sSet} (r\backslash s') \cdot r^*$ 
       
   945 \end{center}
       
   946 is satisfied.
       
   947 
       
   948 Functions $\starupdate$ and $\starupdates$ characterise what the 
       
   949 star derivatives will look like once ``straightened out'' into lists.
       
   950 The helper functions for such operations will be similar to
       
   951 $\sflat{\_}$, $\sflataux{\_}$ and $\sflataux{\_}$, which we defined for sequence.
       
   952 We use similar symbols to denote them, with a $*$ subscript to mark the difference.
       
   953 \begin{center}
       
   954 	\begin{tabular}{lcl}
       
   955 		$\hflataux{r_1 + r_2}$ & $\dn$ & $\hflataux{r_1} @ \hflataux{r_2}$\\
       
   956 		$\hflataux{r}$ & $\dn$ & $[r]$
       
   957 	\end{tabular}
       
   958 \end{center}
       
   959 
       
   960 \begin{center}
       
   961 	\begin{tabular}{lcl}
       
   962 		$\hflat{r_1 + r_2}$ & $\dn$ & $\sum (\hflataux {r_1} @ \hflataux {r_2}) $\\
       
   963 		$\hflat{r}$ & $\dn$ & $r$
       
   964 	\end{tabular}
       
   965 \end{center}
       
   966 \noindent
       
   967 These definitions are tailor-made for dealing with alternatives that have
       
   968 originated from a star's derivatives.
       
   969 A typical star derivative always has the structure of a balanced binary tree:
       
   970 \begin{center}
       
   971 	$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + 
       
   972 	(r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $ 
       
   973 \end{center}
       
   974 All of the nested structures of alternatives
       
   975 generated from derivatives are binary, and therefore
       
   976 $\hflat{\_}$ and $\hflataux{\_}$ only deal with binary alternatives.
       
   977 $\hflat{\_}$ ``untangles'' like the following:
       
   978 \[
       
   979 	\sum ((r_1 + r_2) + (r_3 + r_4))  + \ldots \;
       
   980 	\stackrel{\hflat{\_}}{\longrightarrow} \;
       
   981 	\RALTS{[r_1, r_2, \ldots, r_n]} 
       
   982 \]
       
   983 Here is a lemma stating the recursive property of $\starupdate$ and $\starupdates$,
       
   984 with the helpers $\hflat{\_}$ and $\hflataux{\_}$\footnote{The function $\textit{concat}$ takes a list of lists 
       
   985 			and merges each of the element lists to form a flattened list.}:
       
   986 \begin{lemma}\label{stupdateInduct1}
       
   987 	\mbox
       
   988 	For a list of strings $Ss$, the following hold.
       
   989 	\begin{itemize}
       
   990 		\item
       
   991 			If we do a derivative on the terms 
       
   992 			$r\backslash_r s \cdot r^*$ (where $s$ is taken from the list $Ss$),
       
   993 			the result will be the same as if we apply $\starupdate$ to $Ss$.
       
   994 			\begin{center}
       
   995 				\begin{tabular}{c}
       
   996 			$\textit{concat} \; (\map \; (\hflataux{\_} \circ ( (\_\backslash_r x)
       
   997 			\circ (\lambda s.\;\; (r \backslash_r s) \cdot r^*)))\; Ss )\;
       
   998 			$\\
       
   999 			\\
       
  1000 			$=$ \\
       
  1001 			\\
       
  1002 			$\map \; (\lambda s. (r \backslash_r s) \cdot (r^*)) \; 
       
  1003 			(\starupdate \; x \; r \; Ss)$
       
  1004 				\end{tabular}
       
  1005 			\end{center}
       
  1006 		\item
       
  1007 			$\starupdates$ is ``composable'' w.r.t a derivative.
       
  1008 			It piggybacks the character $x$ to the tail of the string
       
  1009 			$xs$.
       
  1010 			\begin{center}
       
  1011 				\begin{tabular}{c}
       
  1012 					$\textit{concat} \; (\map \; \hflataux{\_} \; 
       
  1013 					(\map \; (\_\backslash_r x) \; 
       
  1014 					(\map \; (\lambda s.\;\; (r \backslash_r s) \cdot 
       
  1015 					(r^*) ) \; (\starupdates \; xs \; r \; Ss))))$\\
       
  1016 					\\
       
  1017 					$=$\\
       
  1018 					\\
       
  1019 					$\map \; (\lambda s.\;\; (r\backslash_r s) \cdot (r^*)) \;
       
  1020 					(\starupdates \; (xs @ [x]) \; r \; Ss)$
       
  1021 				\end{tabular}
       
  1022 			\end{center}
       
  1023 	\end{itemize}
       
  1024 \end{lemma}
       
  1025 			
       
  1026 \begin{proof}
       
  1027 	Part 1 is by induction on $Ss$.
       
  1028 	Part 2 is by induction on $xs$, where $Ss$ is left to take arbitrary values.
       
  1029 \end{proof}
       
  1030 			
       
  1031 
       
  1032 Like $\textit{createdBySequence}$, we need
       
  1033 a predicate for ``star-created'' regular expressions:
       
  1034 \begin{center}
       
  1035 	\begin{mathpar}
       
  1036 		\inferrule{\mbox{}}{ \textit{createdByStar}\; \RSEQ{ra}{\RSTAR{rb}} }
       
  1037 
       
  1038 		\inferrule{  \textit{createdByStar} \; r_1\; \land  \; \textit{createdByStar} \; r_2 }{\textit{createdByStar} \; (r_1 + r_2) } 
       
  1039 	\end{mathpar}
       
  1040 \end{center}
       
  1041 \noindent
       
  1042 All regular expressions created by taking derivatives of
       
  1043 $r_1 \cdot (r_2)^*$ satisfy the $\textit{createdByStar}$ predicate:
       
  1044 \begin{lemma}\label{starDersCbs}
       
  1045 	$\textit{createdByStar} \; ((r_1 \cdot r_2^*) \backslash_r s) $ holds.
       
  1046 \end{lemma}
       
  1047 \begin{proof}
       
  1048 	By a reverse induction on $s$.
       
  1049 \end{proof}
       
  1050 If a regular expression conforms to the shape of a star's derivative,
       
  1051 then we can push an application of $\hflataux{\_}$ inside a derivative of it:
       
  1052 \begin{lemma}\label{hfauPushin}
       
  1053 	If $\textit{createdByStar} \; r$ holds, then
       
  1054 	\begin{center}
       
  1055 		$\hflataux{r \backslash_r c} = \textit{concat} \; (
       
  1056 		\map \; \hflataux{\_} (\map \; (\_\backslash_r c) \;(\hflataux{r})))$
       
  1057 	\end{center}
       
  1058 	holds.
       
  1059 \end{lemma}
       
  1060 \begin{proof}
       
  1061 	By an induction on the inductivev cases of $\textit{createdByStar}$.
       
  1062 \end{proof}
       
  1063 %This is not entirely true for annotated regular expressions: 
       
  1064 %%TODO: bsimp bders \neq bderssimp
       
  1065 %\begin{center}
       
  1066 %	$(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$
       
  1067 %\end{center}
       
  1068 %For bit-codes, the order in which simplification is applied
       
  1069 %might cause a difference in the location they are placed.
       
  1070 %If we want something like
       
  1071 %\begin{center}
       
  1072 %	$\bderssimp{r}{s} \myequiv \bsimp{\bders{r}{s}}$
       
  1073 %\end{center}
       
  1074 %Some "canonicalization" procedure is required,
       
  1075 %which either pushes all the common bitcodes to nodes
       
  1076 %as senior as possible:
       
  1077 %\begin{center}
       
  1078 %	$_{bs}(_{bs_1 @ bs'}r_1 + _{bs_1 @ bs''}r_2) \rightarrow _{bs @ bs_1}(_{bs'}r_1 + _{bs''}r_2) $
       
  1079 %\end{center}
       
  1080 %or does the reverse. However bitcodes are not of interest if we are talking about
       
  1081 %the $\llbracket r \rrbracket$ size of a regex.
       
  1082 %Therefore for the ease and simplicity of producing a
       
  1083 %proof for a size bound, we are happy to restrict ourselves to 
       
  1084 %unannotated regular expressions, and obtain such equalities as
       
  1085 %TODO: rsimp sflat
       
  1086 % The simplification of a flattened out regular expression, provided it comes
       
  1087 %from the derivative of a star, is the same as the one nested.
       
  1088 
       
  1089 
       
  1090 
       
  1091 Now we introduce an inductive property
       
  1092 for $\starupdate$ and $\hflataux{\_}$.
       
  1093 \begin{lemma}\label{starHfauInduct}
       
  1094 	If we do derivatives of $r^*$
       
  1095 	with a string that starts with $c$,
       
  1096 	then flatten it out,
       
  1097 	we obtain a list
       
  1098 	of the shape $\sum_{s' \in sS} (r\backslash_r s') \cdot r^*$,
       
  1099 	where $sS = \starupdates \; s \; r \; [[c]]$. Namely,
       
  1100 	\begin{center}
       
  1101 		$\hflataux{(\rders{( (\rder{c}{r_0})\cdot(r_0^*))}{s})} = 
       
  1102 		\map \; (\lambda s_1. (r_0 \backslash_r s_1) \cdot (r_0^*)) \; 
       
  1103 		(\starupdates \; s \; r_0 \; [[c]])$
       
  1104 	\end{center}
       
  1105 holds.
       
  1106 \end{lemma}
       
  1107 \begin{proof}
       
  1108 	By an induction on $s$, the inductive cases
       
  1109 	being $[]$ and $s@[c]$. The lemmas \ref{hfauPushin} and \ref{starDersCbs} are used.
       
  1110 \end{proof}
       
  1111 \noindent
       
  1112 
       
  1113 $\hflataux{\_}$ has a similar effect as $\textit{flatten}$:
       
  1114 \begin{lemma}\label{hflatauxGrewrites}
       
  1115 	$a :: rs \grewrites \hflataux{a} @ rs$
       
  1116 \end{lemma}
       
  1117 \begin{proof}
       
  1118 	By induction on $a$. $rs$ is set to take arbitrary values.
       
  1119 \end{proof}
       
  1120 It is also not surprising that $\textit{rsimp}$ will cover
       
  1121 the effect of $\hflataux{\_}$:
       
  1122 \begin{lemma}\label{cbsHfauRsimpeq1}
       
  1123 	$\rsimp{(r_1 + r_2)} = \rsimp{(\RALTS{\hflataux{r_1} @ \hflataux{r_2}})}$
       
  1124 \end{lemma}
       
  1125 
       
  1126 \begin{proof}
       
  1127 	By using the rewriting relation $\rightsquigarrow$
       
  1128 \end{proof}
       
  1129 And from this we obtain a proof that a star's derivative will be the same
       
  1130 as if it had all its nested alternatives created during deriving being flattened out:
       
  1131 For example,
       
  1132 \begin{lemma}\label{hfauRsimpeq2}
       
  1133 	$\createdByStar{r} \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$
       
  1134 \end{lemma}
       
  1135 \begin{proof}
       
  1136 	By structural induction on $r$, where the induction rules 
       
  1137 	are these of $\createdByStar{\_}$.
       
  1138 	Lemma \ref{cbsHfauRsimpeq1} is used in the inductive case.
       
  1139 \end{proof}
       
  1140 
       
  1141 
       
  1142 %Here is a corollary that states the lemma in
       
  1143 %a more intuitive way:
       
  1144 %\begin{corollary}
       
  1145 %	$\hflataux{r^* \backslash_r (c::xs)} = \map \; (\lambda s. (r \backslash_r s) \cdot
       
  1146 %	(r^*))\; (\starupdates \; c\; r\; [[c]])$
       
  1147 %\end{corollary}
       
  1148 %\noindent
       
  1149 %Note that this is also agnostic of the simplification
       
  1150 %function we defined, and is therefore of more general interest.
       
  1151 
       
  1152 Together with the rewriting relation
       
  1153 \begin{lemma}\label{starClosedForm6Hrewrites}
       
  1154 	We have the following set of rewriting relations or equalities:
       
  1155 	\begin{itemize}
       
  1156 		\item
       
  1157 			$\textit{rsimp} \; (r^* \backslash_r (c::s)) 
       
  1158 			\sequal
       
  1159 			\sum \; ( ( \sum (\lambda s. (r\backslash_r s) \cdot r^*) \; (
       
  1160 			\starupdates \; s \; r \; [ c::[]] ) ) )$
       
  1161 		\item
       
  1162 			$r \backslash_{rsimp} (c::s) = \textit{rsimp} \; ( (
       
  1163 			\sum ( (\map \; (\lambda s_1. (r\backslash s_1) \; r^*) \;
       
  1164 			(\starupdates \;s \; r \; [ c::[] ])))))$
       
  1165 		\item
       
  1166 			$\sum ( (\map \; (\lambda s. (r\backslash s) \; r^*) \; Ss))
       
  1167 			\sequal
       
  1168 			 \sum ( (\map \; (\lambda s. \textit{rsimp} \; (r\backslash s) \;
       
  1169 			 r^*) \; Ss) )$
       
  1170 		\item
       
  1171 			$\map \; (\lambda s. (\rsimp{r \backslash_r s}) \cdot (r^*)) \; Ss
       
  1172 			\scfrewrites
       
  1173 			\map \; (\lambda s. (\rsimp{r \backslash_r s}) \cdot (r^*)) \; Ss$
       
  1174 		\item
       
  1175 			$( ( \sum ( ( \map \ (\lambda s. \;\;
       
  1176 			(\textit{rsimp} \; (r \backslash_r s)) \cdot r^*) \; (\starupdates \;
       
  1177 			s \; r \; [ c::[] ])))))$\\
       
  1178 			$\sequal$\\
       
  1179 			$( ( \sum ( ( \map \ (\lambda s. \;\;
       
  1180 			( r \backslash_{rsimp} s)) \cdot r^*) \; (\starupdates \;
       
  1181 			s \; r \; [ c::[] ]))))$\\
       
  1182 	\end{itemize}
       
  1183 \end{lemma}
       
  1184 \begin{proof}
       
  1185 	Part 1 leads to part 2.
       
  1186 	The rest of them are routine.
       
  1187 \end{proof}
       
  1188 \noindent
       
  1189 Next the closed form for star regular expressions can be derived:
       
  1190 \begin{theorem}\label{starClosedForm}
       
  1191 	$\rderssimp{r^*}{c::s} = 
       
  1192 	\rsimp{
       
  1193 		(\sum (\map \; (\lambda s. (\rderssimp{r}{s})\cdot r^*) \; 
       
  1194 		(\starupdates \; s\; r \; [[c]])
       
  1195 		)
       
  1196 		)
       
  1197 	}
       
  1198 	$
       
  1199 \end{theorem}
       
  1200 \begin{proof}
       
  1201 	By an induction on $s$.
       
  1202 	The lemmas \ref{rsimpIdem}, \ref{starHfauInduct}, \ref{starClosedForm6Hrewrites} 
       
  1203 	and \ref{hfauRsimpeq2}
       
  1204 	are used.	
       
  1205 	In \ref{starClosedForm6Hrewrites}, the equalities are
       
  1206 	used to link the LHS and RHS.
       
  1207 \end{proof}
       
  1208 
       
  1209 
       
  1210 
       
  1211 
   546 The closed form for them looks like:
  1212 The closed form for them looks like:
   547 %\begin{center}
  1213 %\begin{center}
   548 %	\begin{tabular}{llrclll}
  1214 %	\begin{tabular}{llrclll}
   549 %		$r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & \\
  1215 %		$r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & \\
   550 %		$\textit{rsimp}$ & $($ & $
  1216 %		$\textit{rsimp}$ & $($ & $
   595 		$\starupdates \; (c :: cs) \; r \; Ss$ &  $=$ &  $\starupdates \; cs \; r \; (
  1261 		$\starupdates \; (c :: cs) \; r \; Ss$ &  $=$ &  $\starupdates \; cs \; r \; (
   596 		\starupdate \; c \; r \; Ss)$
  1262 		\starupdate \; c \; r \; Ss)$
   597 	\end{tabular}
  1263 	\end{tabular}
   598 \end{center}
  1264 \end{center}
   599 \noindent
  1265 \noindent
       
  1266 
       
  1267 
       
  1268 
   600 
  1269 
   601 %----------------------------------------------------------------------------------------
  1270 %----------------------------------------------------------------------------------------
   602 %	SECTION 1
  1271 %	SECTION 1
   603 %----------------------------------------------------------------------------------------
  1272 %----------------------------------------------------------------------------------------
   604 
  1273 
   661 % \[
  1330 % \[
   662 % \mkeps(~(r))  = \textit{if}(\nullable(r)) \; \textit{Error} \; \textit{else} \; \textit{Not}(\textit{Dummy})
  1331 % \mkeps(~(r))  = \textit{if}(\nullable(r)) \; \textit{Error} \; \textit{else} \; \textit{Not}(\textit{Dummy})
   663 % \]
  1332 % \]
   664 % 
  1333 % 
   665 %
  1334 %
       
  1335 %\begin{center}
       
  1336 %	\begin{tabular}{lcl}
       
  1337 %		$\ntset \; r \; (n+1) \; c::cs $ & $\dn$ & $\nupdates \;
       
  1338 %		cs \; r \; [\Some \; ([c], n)]$\\
       
  1339 %		$\ntset \; r\; 0 \; \_$ &  $\dn$ &  $\None$\\
       
  1340 %		$\ntset \; r \; \_ \; [] $ & $ \dn$ & $[]$\\
       
  1341 %	\end{tabular}
       
  1342 %\end{center}