ChengsongTanPhdThesis/Chapters/Cubic.tex
changeset 621 17c7611fb0a9
parent 620 ae6010c14e49
child 625 b797c9a709d9
equal deleted inserted replaced
620:ae6010c14e49 621:17c7611fb0a9
    37 Animirov's partial derivatives.\\
    37 Animirov's partial derivatives.\\
    38 Secondly, we extend our $\blexersimp$
    38 Secondly, we extend our $\blexersimp$
    39 to support bounded repetitions ($r^{\{n\}}$).
    39 to support bounded repetitions ($r^{\{n\}}$).
    40 We update our formalisation of 
    40 We update our formalisation of 
    41 the correctness and finiteness properties to
    41 the correctness and finiteness properties to
    42 include this new construct. With bounded repetitions
    42 include this new construct. 
    43 we are able to out-compete other verified lexers such as
    43 we can out-compete other verified lexers such as
    44 Verbatim++ on regular expressions which involve a lot of
    44 Verbatim++ on bounded regular expressions.
    45 repetitions. %We also present the idempotency property proof
    45 %We also present the idempotency property proof
    46 %of $\bsimp$, which leverages the idempotency proof of $\rsimp$.
    46 %of $\bsimp$, which leverages the idempotency proof of $\rsimp$.
    47 %This reinforces our claim that the fixpoint construction
    47 %This reinforces our claim that the fixpoint construction
    48 %originally required by Sulzmann and Lu can be removed in $\blexersimp$.
    48 %originally required by Sulzmann and Lu can be removed in $\blexersimp$.
    49 \\
    49 
    50 Last but not least, we present our efforts and challenges we met
    50 %Last but not least, we present our efforts and challenges we met
    51 in further improving the algorithm by data
    51 %in further improving the algorithm by data
    52 structures such as zippers.
    52 %structures such as zippers.
    53 
    53 
    54 
    54 
    55 
    55 
    56 %----------------------------------------------------------------------------------------
    56 %----------------------------------------------------------------------------------------
    57 %	SECTION strongsimp
    57 %	SECTION strongsimp
    88 \caption{Desired simplification, but not done in $\blexersimp$}\label{partialDedup}
    88 \caption{Desired simplification, but not done in $\blexersimp$}\label{partialDedup}
    89 \end{figure}
    89 \end{figure}
    90 \noindent
    90 \noindent
    91 A simplification like this actually
    91 A simplification like this actually
    92 cannot be omitted,
    92 cannot be omitted,
    93 as without it the size could blow up even with our $\simp$ function:
    93 as without it the size could blow up even with our $\textit{bsimp}$ 
       
    94 function: for the chapter \ref{Finite} example
       
    95 $\protect((a^* + (aa)^* + \ldots + (\underbrace{a\ldots a}_{n a's})^* )^*)^*$,
       
    96 by just setting n to a small number,
       
    97 we get exponential growth that does not stop before it becomes huge: 
    94 \begin{figure}[H]
    98 \begin{figure}[H]
    95 \centering
    99 \centering
    96 \begin{tikzpicture}
   100 \begin{tikzpicture}
    97 \begin{axis}[
   101 \begin{axis}[
    98     %xlabel={$n$},
   102     %xlabel={$n$},
   117 \caption{Desired simplification, but not done in $\blexersimp$}\label{desiredSimp}
   121 \caption{Desired simplification, but not done in $\blexersimp$}\label{desiredSimp}
   118 \end{figure}
   122 \end{figure}
   119 \noindent
   123 \noindent
   120 in our $\simp$ function,
   124 in our $\simp$ function,
   121 so that it makes the simplification in \ref{partialDedup} possible.
   125 so that it makes the simplification in \ref{partialDedup} possible.
   122 For example,
   126 Translating the rule into our $\textit{bsimp}$ function simply
   123 can a function like the below one be used?
   127 involves adding a new clause to the $\textit{bsimp}_{ASEQ}$ function:
   124 %TODO: simp' that spills things
   128 \begin{center}
       
   129 	\begin{tabular}{@{}lcl@{}}
       
   130 		$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
       
   131 						       && $\ldots$\\
       
   132    &&$\quad\textit{case} \; (_{bs1}\sum as, a_2') \Rightarrow _{bs1}\sum (
       
   133    \map \; (_{[]}\textit{ASEQ} \; \_ \; a_2') \; as)$\\
       
   134    &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow   _{bs}a_1' \cdot a_2'$ \\
       
   135 	\end{tabular}
       
   136 \end{center}
       
   137 
       
   138 
   125 
   139 
   126 Unfortunately,
   140 Unfortunately,
   127 if we introduce them in our
   141 if we introduce them in our
   128 setting we would lose the POSIX property of our calculated values. 
   142 setting we would lose the POSIX property of our calculated values. 
   129 For example given the regular expression 
   143 For example given the regular expression 
   169 \end{center}
   183 \end{center}
   170 using  a function similar to $\distinctBy$,
   184 using  a function similar to $\distinctBy$,
   171 but this time 
   185 but this time 
   172 we allow a more general list rewrite:
   186 we allow a more general list rewrite:
   173 \begin{mathpar}\label{cubicRule}
   187 \begin{mathpar}\label{cubicRule}
   174 		\inferrule{\vspace{0mm} }{rs_a@[a]@rs_c 
   188 	\inferrule * [Right = cubicRule]{\vspace{0mm} }{rs_a@[a]@rs_c 
   175 			\stackrel{s}{\rightsquigarrow }
   189 			\stackrel{s}{\rightsquigarrow }
   176 		rs_a@[\textit{prune} \; a \; rs_a]@rs_c }
   190 		rs_a@[\textit{prune} \; a \; rs_a]@rs_c }
   177 \end{mathpar}
   191 \end{mathpar}
   178 %L \; a_1' = L \; a_1 \setminus (\cup_{a \in rs_a} L \; a)
   192 %L \; a_1' = L \; a_1 \setminus (\cup_{a \in rs_a} L \; a)
   179 where $\textit{prune} \;a \; acc$ traverses $a$
   193 where $\textit{prune} \;a \; acc$ traverses $a$
   197 function $\textit{prune}$ in Scala,
   211 function $\textit{prune}$ in Scala,
   198 and incorporated into our lexer,
   212 and incorporated into our lexer,
   199 by replacing the $\simp$ function 
   213 by replacing the $\simp$ function 
   200 with a stronger version called $\bsimpStrong$
   214 with a stronger version called $\bsimpStrong$
   201 that prunes regular expressions.
   215 that prunes regular expressions.
   202 We call this lexer $\blexerStrong$.
       
   203 $\blexerStrong$ is able to drastically reduce the
       
   204 internal data structure size which could
       
   205 trigger exponential behaviours in
       
   206 $\blexersimp$.
       
   207 \begin{figure}[H]
   216 \begin{figure}[H]
   208 \centering
   217 
   209 \begin{tabular}{@{}c@{\hspace{0mm}}c@{\hspace{0mm}}c@{}}
       
   210 \begin{tikzpicture}
       
   211 \begin{axis}[
       
   212     %xlabel={$n$},
       
   213     myplotstyle,
       
   214     xlabel={input length},
       
   215     ylabel={size},
       
   216     ]
       
   217 \addplot[red,mark=*, mark options={fill=white}] table {strongSimpCurve.data};
       
   218 \end{axis}
       
   219 \end{tikzpicture}
       
   220   &
       
   221 \begin{tikzpicture}
       
   222 \begin{axis}[
       
   223     %xlabel={$n$},
       
   224     myplotstyle,
       
   225     xlabel={input length},
       
   226     ylabel={size},
       
   227     ]
       
   228 \addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data};
       
   229 \end{axis}
       
   230 \end{tikzpicture}\\
       
   231 \multicolumn{2}{l}{Graphs: Runtime for matching $((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ with strings 
       
   232 	   of the form $\underbrace{aa..a}_{n}$.}
       
   233 \end{tabular}    
       
   234 \caption{aaaaaStarStar} \label{fig:aaaaaStarStar}
       
   235 \end{figure}
       
   236 \begin{figure}[H]
       
   237 \begin{lstlisting}
   218 \begin{lstlisting}
   238     def atMostEmpty(r: Rexp) : Boolean = r match {
   219     def atMostEmpty(r: Rexp) : Boolean = r match {
   239       case ZERO => true
   220       case ZERO => true
   240       case ONE => true
   221       case ONE => true
   241       case STAR(r) => atMostEmpty(r)
   222       case STAR(r) => atMostEmpty(r)
   395         }
   376         }
   396       }
   377       }
   397 \end{lstlisting}
   378 \end{lstlisting}
   398 \end{figure}
   379 \end{figure}
   399 \noindent
   380 \noindent
       
   381 We call this lexer $\blexerStrong$.
       
   382 $\blexerStrong$ is able to drastically reduce the
       
   383 internal data structure size which could
       
   384 trigger exponential behaviours in
       
   385 $\blexersimp$.
       
   386 \begin{figure}[H]
       
   387 \centering
       
   388 \begin{tabular}{@{}c@{\hspace{0mm}}c@{\hspace{0mm}}c@{}}
       
   389 \begin{tikzpicture}
       
   390 \begin{axis}[
       
   391     %xlabel={$n$},
       
   392     myplotstyle,
       
   393     xlabel={input length},
       
   394     ylabel={size},
       
   395     width = 7cm,
       
   396     height = 5cm,
       
   397     ]
       
   398 \addplot[red,mark=*, mark options={fill=white}] table {strongSimpCurve.data};
       
   399 \end{axis}
       
   400 \end{tikzpicture}
       
   401   &
       
   402 \begin{tikzpicture}
       
   403 \begin{axis}[
       
   404     %xlabel={$n$},
       
   405     myplotstyle,
       
   406     xlabel={input length},
       
   407     ylabel={size},
       
   408     width = 7cm,
       
   409     height = 5cm,
       
   410     ]
       
   411 \addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data};
       
   412 \end{axis}
       
   413 \end{tikzpicture}\\
       
   414 \multicolumn{2}{l}{}
       
   415 \end{tabular}    
       
   416 \caption{Runtime for matching 
       
   417 	$\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ with strings 
       
   418 	   of the form $\protect\underbrace{aa..a}_{n}$.}\label{fig:aaaaaStarStar}
       
   419 \end{figure}
       
   420 \noindent
   400 We would like to preserve the correctness like the one 
   421 We would like to preserve the correctness like the one 
   401 we had for $\blexersimp$:
   422 we had for $\blexersimp$:
   402 \begin{conjecture}\label{cubicConjecture}
   423 \begin{conjecture}\label{cubicConjecture}
   403 	$\blexerStrong \;r \; s = \blexer\; r\;s$
   424 	$\blexerStrong \;r \; s = \blexer\; r\;s$
   404 \end{conjecture}
   425 \end{conjecture}
   405 \noindent
   426 \noindent
   406 We introduce the new rule \ref{cubicRule}
   427 The idea is to maintain key lemmas in
   407 and augment our proofs in chapter \ref{Bitcoded2}.
   428 chapter \ref{Bitcoded2} like
   408 The idea is to maintain the properties like
   429 $r \stackrel{*}{\rightsquigarrow} \textit{bsimp} \; r$
   409 $r \stackrel{*}{\rightsquigarrow} \textit{bsimp} \; r$ etc.
   430 with the new rewriting rule \ref{cubicRule} .
   410 
   431 
   411 In the next section,
   432 In the next sub-section,
   412 we will describe why we 
   433 we will describe why we 
   413 believe a cubic bound can be achieved.
   434 believe a cubic bound can be achieved.
   414 We give an introduction to the
   435 We give an introduction to the
   415 partial derivatives,
   436 partial derivatives,
   416 which was invented by Antimirov \cite{Antimirov95},
   437 which was invented by Antimirov \cite{Antimirov95},
   417 and then link it with the result of the function
   438 and then link it with the result of the function
   418 $\bdersStrongs$.
   439 $\bdersStrongs$.
   419  
   440  
   420 \section{Antimirov's partial derivatives}
   441 \subsection{Antimirov's partial derivatives}
   421 This suggests a link towrads "partial derivatives"
   442 Partial derivatives were first introduced by 
   422  introduced  The idea behind Antimirov's partial derivatives
   443 Antimirov \cite{Antimirov95}.
   423 is to do derivatives in a similar way as suggested by Brzozowski, 
   444 It does derivatives in a similar way as suggested by Brzozowski, 
   424 but maintain a set of regular expressions instead of a single one:
   445 but splits children of alternative regular expressions into 
   425 
   446 multiple independent terms, causing the output to become a 
   426 %TODO: antimirov proposition 3.1, needs completion
   447 set of regular expressions:
   427  \begin{center}  
   448 \begin{center}  
   428  \begin{tabular}{ccc}
   449  	\begin{tabular}{lcl}
   429  $\partial_x(a+b)$ & $=$ & $\partial_x(a) \cup \partial_x(b)$\\
   450 		$\partial_x \; (a \cdot b)$ & 
   430 $\partial_x(\ONE)$ & $=$ & $\phi$
   451 		$\dn$ & $\partial_x \; a\cdot b \cup
   431 \end{tabular}
   452 		\partial_x \; b \; \textit{if} \; \; \nullable\; a$\\
   432 \end{center}
   453 		      & & $\partial_x \; a\cdot b \quad\quad 
   433 
   454 		      \textit{otherwise}$\\ 
       
   455 		$\partial_x \; r^*$ & $\dn$ & $\partial_x \; r \cdot r^*$\\
       
   456 		$\partial_x \; c $ & $\dn$ & $\textit{if} \; x = c \; 
       
   457 		\textit{then} \;
       
   458 		\{ \ONE\} \;\;\textit{else} \; \varnothing$\\
       
   459 		$\partial_x(a+b)$ & $=$ & $\partial_x(a) \cup \partial_x(b)$\\
       
   460 		$\partial_x(\ONE)$ & $=$ & $\varnothing$\\
       
   461 		$\partial_x(\ZERO)$ & $\dn$ & $\varnothing$\\
       
   462 	\end{tabular}
       
   463 \end{center}
       
   464 \noindent
       
   465 The $\cdot$ between for example 
       
   466 $\partial_x \; a\cdot b $ 
       
   467 is a shorthand notation for the cartesian product 
       
   468 $\partial_x \; a \times \{ b\}$.
       
   469 %Each element in the set generated by a partial derivative
       
   470 %corresponds to a (potentially partial) match
       
   471 %TODO: define derivatives w.r.t string s
   434 Rather than joining the calculated derivatives $\partial_x a$ and $\partial_x b$ together
   472 Rather than joining the calculated derivatives $\partial_x a$ and $\partial_x b$ together
   435 using the alternatives constructor, Antimirov cleverly chose to put them into
   473 using the $\sum$ constructor, Antimirov put them into
   436 a set instead.  This breaks the terms in a derivative regular expression up, 
   474 a set.  This causes maximum de-duplication to happen, 
   437 allowing us to understand what are the "atomic" components of it.
   475 allowing us to understand what are the "atomic" components of it.
   438 For example, To compute what regular expression $x^*(xx + y)^*$'s 
   476 For example, To compute what regular expression $x^*(xx + y)^*$'s 
   439 derivative against $x$ is made of, one can do a partial derivative
   477 derivative against $x$ is made of, one can do a partial derivative
   440 of it and get two singleton sets $\{x^* \cdot (xx + y)^*\}$ and $\{x \cdot (xx + y) ^* \}$
   478 of it and get two singleton sets $\{x^* \cdot (xx + y)^*\}$ and $\{x \cdot (xx + y) ^* \}$
   441 from $\partial_x(x^*) \cdot (xx + y) ^*$ and $\partial_x((xx + y)^*)$.
   479 from $\partial_x(x^*) \cdot (xx + y) ^*$ and $\partial_x((xx + y)^*)$.
   442 To get all the "atomic" components of a regular expression's possible derivatives,
   480 
   443 there is a procedure Antimirov called $\textit{lf}$, short for "linear forms", that takes
   481 The set of all possible partial derivatives is defined
   444 whatever character is available at the head of the string inside the language of a
   482 as the union of derivatives w.r.t all the strings in the universe:
   445 regular expression, and gives back the character and the derivative regular expression
   483 \begin{center}
   446 as a pair (which he called "monomial"):
   484 	\begin{tabular}{lcl}
   447  \begin{center}  
   485 		$\textit{PDER}_{UNIV} \; r $ & $\dn $ & $\bigcup_{w \in A^*}\partial_w \; r$
   448  \begin{tabular}{ccc}
   486 	\end{tabular}
   449  $\lf(\ONE)$ & $=$ & $\phi$\\
   487 \end{center}
   450 $\lf(c)$ & $=$ & $\{(c, \ONE) \}$\\
   488 \noindent
   451  $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
   489 
   452  $\lf(r^*)$ & $=$ & $\lf(r) \bigodot \lf(r^*)$\\
   490 Back to our 
   453 \end{tabular}
   491 \begin{center}
   454 \end{center}
   492 	$((a^* + (aa)^* + \ldots + (\underbrace{a\ldots a}_{n a's})^* )^*)^*$
   455 %TODO: completion
   493 \end{center}
   456 
   494 example, if we denote this regular expression as $A$,
   457 There is a slight difference in the last three clauses compared
   495 we have that
   458 with $\partial$: instead of a dot operator $ \textit{rset} \cdot r$ that attaches the regular 
   496 \begin{center}
   459 expression $r$ with every element inside $\textit{rset}$ to create a set of 
   497 $\textit{PDER}_{UNIV} \; A = 
   460 sequence derivatives, it uses the "circle dot" operator $\bigodot$ which operates 
   498 \bigcup_{i=1}^{n}\bigcup_{j=0}^{i-1} \{ 
   461 on a set of monomials (which Antimirov called "linear form") and a regular 
   499 	(\underbrace{a \ldots a}_{\text{j a's}}\cdot
   462 expression, and returns a linear form:
   500 (\underbrace{a \ldots a}_{\text{i a's}})^*)\cdot A \}$, 
   463  \begin{center}  
   501 \end{center}
   464  \begin{tabular}{ccc}
   502 with exactly $n * (n + 1) / 2$ terms.
   465  $l \bigodot (\ZERO)$ & $=$ & $\phi$\\
   503 This is in line with our speculation that only $n*(n+1)/2$ terms are
   466  $l \bigodot (\ONE)$ & $=$ & $l$\\
   504 needed. We conjecture that $\bsimpStrong$ is also able to achieve this
   467  $\phi \bigodot t$ & $=$ & $\phi$\\
   505 upper limit in general
   468  $\{ (x, \ZERO) \} \bigodot t$ & $=$ & $\{(x,\ZERO) \}$\\
   506 \begin{conjecture}\label{bsimpStrongInclusionPder}
   469  $\{ (x, \ONE) \} \bigodot t$ & $=$ & $\{(x,t) \}$\\
   507 	Using a suitable transformation $f$, we have 
   470   $\{ (x, p) \} \bigodot t$ & $=$ & $\{(x,p\cdot t) \}$\\
   508 	\begin{center}
   471  $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
   509 		$\forall s.\; f \; (r \bdersStrong  \; s) \subseteq
   472  $\lf(r^*)$ & $=$ & $\lf(r) \cdot \lf(r^*)$\\
   510 		 \textit{PDER}_{UNIV} \; r$
   473 \end{tabular}
   511 	\end{center}
   474 \end{center}
   512 \end{conjecture}
   475 %TODO: completion
   513 \noindent
   476 
   514 because our \ref{cubicRule} will keep only one copy of each term,
   477  Some degree of simplification is applied when doing $\bigodot$, for example,
   515 where the function $\textit{prune}$ takes care of maintaining
   478  $l \bigodot (\ZERO) = \phi$ corresponds to $r \cdot \ZERO \rightsquigarrow \ZERO$,
   516 a set like structure similar to partial derivatives.
   479  and  $l \bigodot (\ONE) = l$ to $l \cdot \ONE \rightsquigarrow l$, and
   517 It is anticipated we might need to adjust $\textit{prune}$
   480   $\{ (x, \ZERO) \} \bigodot t = \{(x,\ZERO) \}$ to $\ZERO \cdot x \rightsquigarrow \ZERO$,
   518 slightly to make sure all duplicate terms are eliminated,
   481   and so on.
   519 which should be doable.
   482   
   520 
   483   With the function $\lf$ one can compute all possible partial derivatives $\partial_{UNIV}(r)$ of a regular expression $r$ with 
   521 Antimirov had proven that the sum of all the partial derivative 
   484   an iterative procedure:
   522 terms' sizes is bounded by the cubic of the size of that regular
   485    \begin{center}  
   523 expression:
   486  \begin{tabular}{llll}
   524 \begin{property}\label{pderBound}
   487 $\textit{while}$ & $(\Delta_i \neq \phi)$  &                &          \\
   525 	$\llbracket \textit{PDER}_{UNIV} \; r \rrbracket \leq O((\llbracket r \rrbracket)^3)$
   488  		       &  $\Delta_{i+1}$           &        $ =$ & $\lf(\Delta_i) - \PD_i$ \\
   526 \end{property}
   489 		       &  $\PD_{i+1}$              &         $ =$ & $\Delta_{i+1} \cup \PD_i$ \\
   527 This property was formalised by Urban, and the details are in the PDERIVS.thy file
   490 $\partial_{UNIV}(r)$ & $=$ & $\PD$ &		     
   528 in our repository.
   491 \end{tabular}
   529 Once conjecture \ref{bsimpStrongInclusionPder} is proven, then property \ref{pderBound}
   492 \end{center}
   530 would yield us a cubic bound for our $\blexerStrong$ algorithm: 
   493 
   531 \begin{conjecture}\label{strongCubic}
   494 
   532 	$\llbracket r \bdersStrong\; s \rrbracket \leq \llbracket r \rrbracket^3$
   495  $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$,
   533 \end{conjecture}
       
   534 
       
   535 
       
   536 %To get all the "atomic" components of a regular expression's possible derivatives,
       
   537 %there is a procedure Antimirov called $\textit{lf}$, short for "linear forms", that takes
       
   538 %whatever character is available at the head of the string inside the language of a
       
   539 %regular expression, and gives back the character and the derivative regular expression
       
   540 %as a pair (which he called "monomial"):
       
   541 % \begin{center}  
       
   542 % \begin{tabular}{ccc}
       
   543 % $\lf(\ONE)$ & $=$ & $\phi$\\
       
   544 %$\lf(c)$ & $=$ & $\{(c, \ONE) \}$\\
       
   545 % $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
       
   546 % $\lf(r^*)$ & $=$ & $\lf(r) \bigodot \lf(r^*)$\\
       
   547 %\end{tabular}
       
   548 %\end{center}
       
   549 %%TODO: completion
       
   550 %
       
   551 %There is a slight difference in the last three clauses compared
       
   552 %with $\partial$: instead of a dot operator $ \textit{rset} \cdot r$ that attaches the regular 
       
   553 %expression $r$ with every element inside $\textit{rset}$ to create a set of 
       
   554 %sequence derivatives, it uses the "circle dot" operator $\bigodot$ which operates 
       
   555 %on a set of monomials (which Antimirov called "linear form") and a regular 
       
   556 %expression, and returns a linear form:
       
   557 % \begin{center}  
       
   558 % \begin{tabular}{ccc}
       
   559 % $l \bigodot (\ZERO)$ & $=$ & $\phi$\\
       
   560 % $l \bigodot (\ONE)$ & $=$ & $l$\\
       
   561 % $\phi \bigodot t$ & $=$ & $\phi$\\
       
   562 % $\{ (x, \ZERO) \} \bigodot t$ & $=$ & $\{(x,\ZERO) \}$\\
       
   563 % $\{ (x, \ONE) \} \bigodot t$ & $=$ & $\{(x,t) \}$\\
       
   564 %  $\{ (x, p) \} \bigodot t$ & $=$ & $\{(x,p\cdot t) \}$\\
       
   565 % $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
       
   566 % $\lf(r^*)$ & $=$ & $\lf(r) \cdot \lf(r^*)$\\
       
   567 %\end{tabular}
       
   568 %\end{center}
       
   569 %%TODO: completion
       
   570 %
       
   571 % Some degree of simplification is applied when doing $\bigodot$, for example,
       
   572 % $l \bigodot (\ZERO) = \phi$ corresponds to $r \cdot \ZERO \rightsquigarrow \ZERO$,
       
   573 % and  $l \bigodot (\ONE) = l$ to $l \cdot \ONE \rightsquigarrow l$, and
       
   574 %  $\{ (x, \ZERO) \} \bigodot t = \{(x,\ZERO) \}$ to $\ZERO \cdot x \rightsquigarrow \ZERO$,
       
   575 %  and so on.
       
   576 %  
       
   577 %  With the function $\lf$ one can compute all possible partial derivatives $\partial_{UNIV}(r)$ of a regular expression $r$ with 
       
   578 %  an iterative procedure:
       
   579 %   \begin{center}  
       
   580 % \begin{tabular}{llll}
       
   581 %$\textit{while}$ & $(\Delta_i \neq \phi)$  &                &          \\
       
   582 % 		       &  $\Delta_{i+1}$           &        $ =$ & $\lf(\Delta_i) - \PD_i$ \\
       
   583 %		       &  $\PD_{i+1}$              &         $ =$ & $\Delta_{i+1} \cup \PD_i$ \\
       
   584 %$\partial_{UNIV}(r)$ & $=$ & $\PD$ &		     
       
   585 %\end{tabular}
       
   586 %\end{center}
       
   587 %
       
   588 %
       
   589 % $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$,
   496 
   590 
   497 
   591 
   498 
   592 
   499 
   593 
   500 %----------------------------------------------------------------------------------------
   594 %----------------------------------------------------------------------------------------
   510 the ``exactly-$n$-times'' bounded regular expression $r^{\{n\}}$.
   604 the ``exactly-$n$-times'' bounded regular expression $r^{\{n\}}$.
   511 We add clauses in our derivatives-based lexing algorithms (with simplifications)
   605 We add clauses in our derivatives-based lexing algorithms (with simplifications)
   512 introduced in chapter \ref{Bitcoded2}.
   606 introduced in chapter \ref{Bitcoded2}.
   513 
   607 
   514 \subsection{Augmented Definitions}
   608 \subsection{Augmented Definitions}
   515 There are a number of definitions that needs to be augmented.
   609 There are a number of definitions that need to be augmented.
   516 The most notable one would be the POSIX rules for $r^{\{n\}}$:
   610 The most notable one would be the POSIX rules for $r^{\{n\}}$:
   517 \begin{mathpar}
   611 \begin{center}
   518 	\inferrule{v \in vs}{\textit{Stars} place holder}
   612 	\begin{mathpar}
   519 \end{mathpar}
   613 		\inferrule{\forall v \in vs_1. \vdash v:r \land 
       
   614 		|v| \neq []\\ \forall v \in vs_2. \vdash v:r \land |v| = []\\
       
   615 		\textit{length} \; (vs_1 @ vs_2) = n}{\textit{Stars} \;
       
   616 		(vs_1 @ vs_2) : r^{\{n\}} }
       
   617 	\end{mathpar}
       
   618 \end{center}
       
   619 As Ausaf had pointed out \cite{Ausaf},
       
   620 sometimes empty iterations have to be taken to get
       
   621 a match with exactly $n$ repetitions,
       
   622 and hence the $vs_2$ part.
       
   623 
       
   624 Another important definition would be the size:
       
   625 \begin{center}
       
   626 	\begin{tabular}{lcl}
       
   627 		$\llbracket r^{\{n\}} \rrbracket_r$ & $\dn$ & 
       
   628 		$\llbracket r \rrbracket_r + n$\\
       
   629 	\end{tabular}
       
   630 \end{center}
       
   631 \noindent
       
   632 Arguably we should use $\log \; n$ for the size because
       
   633 the number of digits increase logarithmically w.r.t $n$.
       
   634 For simplicity we choose to add the counter directly to the size.
       
   635 
       
   636 The derivative w.r.t a bounded regular expression
       
   637 is given as 
       
   638 \begin{center}
       
   639 	\begin{tabular}{lcl}
       
   640 		$r^{\{n\}} \backslash_r c$ & $\dn$ & 
       
   641 		$r\backslash_r c \cdot r^{\{n-1\}} \;\; \textit{if} \; n \geq 1$\\
       
   642 					   & & $\RZERO \;\quad \quad\quad \quad
       
   643 					   \textit{otherwise}$\\
       
   644 	\end{tabular}
       
   645 \end{center}
       
   646 \noindent
       
   647 For brevity, we sometimes use NTIMES to refer to bounded 
       
   648 regular expressions.
       
   649 The $\mkeps$ function clause for NTIMES would be
       
   650 \begin{center}
       
   651 	\begin{tabular}{lcl}
       
   652 		$\mkeps \; r^{\{n\}} $ & $\dn$ & $\Stars \;
       
   653 		(\textit{replicate} \; n\; (\mkeps \; r))$
       
   654 	\end{tabular}
       
   655 \end{center}
       
   656 \noindent
       
   657 The injection looks like
       
   658 \begin{center}
       
   659 	\begin{tabular}{lcl}
       
   660 		$\inj \; r^{\{n\}} \; c\; (\Seq \;v \; (\Stars \; vs)) $ & 
       
   661 		$\dn$ & $\Stars \;
       
   662 		((\inj \; r \;c \;v ) :: vs)$
       
   663 	\end{tabular}
       
   664 \end{center}
       
   665 \noindent
   520 
   666 
   521 
   667 
   522 \subsection{Proofs for the Augmented Lexing Algorithm}
   668 \subsection{Proofs for the Augmented Lexing Algorithm}
   523 We need to maintain two proofs with the additional $r^{\{n\}}$
   669 We need to maintain two proofs with the additional $r^{\{n\}}$
   524 construct: the 
   670 construct: the 
   756 \begin{proof}
   902 \begin{proof}
   757 	By a reverse induction on $s$.
   903 	By a reverse induction on $s$.
   758 	For the inductive case, note that if $\cbn \; r$ holds,
   904 	For the inductive case, note that if $\cbn \; r$ holds,
   759 	then $\cbn \; (r\backslash_r c)$ holds.
   905 	then $\cbn \; (r\backslash_r c)$ holds.
   760 \end{proof}
   906 \end{proof}
       
   907 \noindent
   761 In addition, for $\cbn$-shaped regular expressioins, one can flatten
   908 In addition, for $\cbn$-shaped regular expressioins, one can flatten
   762 them:
   909 them:
   763 \begin{lemma}\label{ntimesHfauPushin}
   910 \begin{lemma}\label{ntimesHfauPushin}
   764 	If $\cbn \; r$ holds, then $\hflataux{r \backslash_r c} = 
   911 	If $\cbn \; r$ holds, then $\hflataux{r \backslash_r c} = 
   765 	\textit{concat} \; (\map \; \hflataux{\map \; (\_\backslash_r c) \;
   912 	\textit{concat} \; (\map \; \hflataux{\map \; (\_\backslash_r c) \;
   766 	(\hflataux{r})})$
   913 	(\hflataux{r})})$
   767 \end{lemma}
   914 \end{lemma}
   768 \begin{proof}
   915 \begin{proof}
   769 	By an induction on the inductive cases of $\cbn$.
   916 	By an induction on the inductive cases of $\cbn$.
   770 \end{proof}
   917 \end{proof}
   771 		
   918 \noindent
   772 
       
   773 
       
   774 This time we do not need to define the flattening functions for NTIMES only,
   919 This time we do not need to define the flattening functions for NTIMES only,
   775 because $\hflat{\_}$ and $\hflataux{\_}$ work on NTIMES already:
   920 because $\hflat{\_}$ and $\hflataux{\_}$ work on NTIMES already.
   776 \begin{lemma}\label{ntimesHfauInduct}
   921 \begin{lemma}\label{ntimesHfauInduct}
   777 $\hflataux{( (r\backslash_r c) \cdot r^{\{n\}}) \backslash_{rsimps} s} = 
   922 $\hflataux{( (r\backslash_r c) \cdot r^{\{n\}}) \backslash_{rsimps} s} = 
   778  \map \; (\opterm \; r) \; (\nupdates \; s \; r \; [\Some \; ([c], n)])$
   923  \map \; (\opterm \; r) \; (\nupdates \; s \; r \; [\Some \; ([c], n)])$
   779 \end{lemma}
   924 \end{lemma}
   780 \begin{proof}
   925 \begin{proof}
   781 	By a reverse induction on $s$.
   926 	By a reverse induction on $s$.
   782 	The lemma \ref{ntimesDersCbn} is used.
   927 	The lemmas \ref{ntimesHfauPushin} and \ref{ntimesDersCbn} are used.
   783 \end{proof}
   928 \end{proof}
   784 \noindent
   929 \noindent
   785 We have a recursive property for NTIMES with $\nupdate$ 
   930 We have a recursive property for NTIMES with $\nupdate$ 
   786 similar to that for STAR,
   931 similar to that for STAR,
   787 and one for $\nupdates $ as well:
   932 and one for $\nupdates $ as well:
   788 \begin{lemma}\label{nupdateInduct1}
   933 \begin{lemma}\label{nupdateInduct1}
   789 	(i) $\textit{concat} \; (\map \; (\hflataux{\_} \circ (
   934 	\mbox{}
       
   935 	\begin{itemize}
       
   936 		\item
       
   937 			\begin{center}
       
   938 	 $\textit{concat} \; (\map \; (\hflataux{\_} \circ (
   790 	\opterm \; r)) \; Ss) = \map \; (\opterm \; r) \; (\nupdate \;
   939 	\opterm \; r)) \; Ss) = \map \; (\opterm \; r) \; (\nupdate \;
   791 	c \; r \; Ss)$\\
   940 	c \; r \; Ss)$\\
   792 	(ii) $\textit{concat} \; (\map \; \hflataux{\_}\; 
   941 	\end{center}
       
   942 	holds.
       
   943 \item
       
   944 	\begin{center}
       
   945 	 $\textit{concat} \; (\map \; \hflataux{\_}\; 
   793 	\map \; (\_\backslash_r x) \;
   946 	\map \; (\_\backslash_r x) \;
   794 		(\map \; (\opterm \; r) \; (\nupdates \; xs \; r \; Ss))) =
   947 		(\map \; (\opterm \; r) \; (\nupdates \; xs \; r \; Ss)))$\\
   795 	\map \; (\opterm \; r) \; (\nupdates \;(xs@[x]) \; r\;Ss)$ holds.
   948 		$=$\\
       
   949 	$\map \; (\opterm \; r) \; (\nupdates \;(xs@[x]) \; r\;Ss)$ 
       
   950 	\end{center}
       
   951 	holds.
       
   952 	\end{itemize}
   796 \end{lemma}
   953 \end{lemma}
   797 \begin{proof}
   954 \begin{proof}
   798 	(i) is by an induction on $Ss$.
   955 	(i) is by an induction on $Ss$.
   799 	(ii) is by an induction on $xs$.
   956 	(ii) is by an induction on $xs$.
   800 \end{proof}
   957 \end{proof}
   801 
   958 \noindent
   802 The $\nString$ predicate is defined for conveniently
   959 The $\nString$ predicate is defined for conveniently
   803 expressing that there are no empty strings in the
   960 expressing that there are no empty strings in the
   804 $\Some \;(s, n)$ elements generated by $\nupdate$:
   961 $\Some \;(s, n)$ elements generated by $\nupdate$:
   805 \begin{center}
   962 \begin{center}
   806 	\begin{tabular}{lcl}
   963 	\begin{tabular}{lcl}
   808 		$\nString \; (\Some \; ([], n))$ & $\dn$ & $ \textit{false}$\\
   965 		$\nString \; (\Some \; ([], n))$ & $\dn$ & $ \textit{false}$\\
   809 		$\nString \; (\Some \; (c::s, n))$  & $\dn$ & $ \textit{true}$\\
   966 		$\nString \; (\Some \; (c::s, n))$  & $\dn$ & $ \textit{true}$\\
   810 	\end{tabular}
   967 	\end{tabular}
   811 \end{center}
   968 \end{center}
   812 \begin{lemma}\label{nupdatesNonempty}
   969 \begin{lemma}\label{nupdatesNonempty}
   813 	If for all elements $opt$ in $\textit{set} \; Ss$,
   970 	If for all elements $o \in \textit{set} \; Ss$,
   814 	$\nString \; opt$ holds, the we have that
   971 	$\nString \; o$ holds, the we have that
   815 	for all elements $opt$ in $\textit{set} \; (\nupdates \; s \; r \; Ss)$,
   972 	for all elements $o' \in \textit{set} \; (\nupdates \; s \; r \; Ss)$,
   816 	$\nString \; opt$ holds.
   973 	$\nString \; o'$ holds.
   817 \end{lemma}
   974 \end{lemma}
   818 \begin{proof}
   975 \begin{proof}
   819 	By an induction on $s$, where $Ss$ is set to vary over all possible values.
   976 	By an induction on $s$, where $Ss$ is set to vary over all possible values.
   820 \end{proof}
   977 \end{proof}
   821 
   978 
   864 	which is from lemma \ref{nupdatesNonempty}.
  1021 	which is from lemma \ref{nupdatesNonempty}.
   865 	Once the string in $o = \Some \; (s, n)$ is 
  1022 	Once the string in $o = \Some \; (s, n)$ is 
   866 	nonempty, $\optermsimp \; r \;o$,
  1023 	nonempty, $\optermsimp \; r \;o$,
   867 	$\optermosimp \; r \; o$ and $\optermosimp \; \; o$ are guaranteed
  1024 	$\optermosimp \; r \; o$ and $\optermosimp \; \; o$ are guaranteed
   868 	to be equal.
  1025 	to be equal.
       
  1026 	(v) uses \ref{nupdateInduct1}.
   869 \end{proof}
  1027 \end{proof}
   870 
  1028 \noindent
       
  1029 Now we are ready to present the closed form for NTIMES:
   871 \begin{theorem}\label{ntimesClosedForm}
  1030 \begin{theorem}\label{ntimesClosedForm}
   872 	The derivative of $r^{\{n+1\}}$ can be described as an alternative
  1031 	The derivative of $r^{\{n+1\}}$ can be described as an alternative
   873 	containing a list
  1032 	containing a list
   874 	of terms:\\
  1033 	of terms:\\
   875 	$r^{\{n+1\}} \backslash_{rsimps} (c::s) = \textit{rsimp} \; (
  1034 	$r^{\{n+1\}} \backslash_{rsimps} (c::s) = \textit{rsimp} \; (
   877 	[\Some \; ([c], n)])))$
  1036 	[\Some \; ([c], n)])))$
   878 \end{theorem}
  1037 \end{theorem}
   879 \begin{proof}
  1038 \begin{proof}
   880 	By the rewriting steps described in lemma \ref{ntimesClosedFormsSteps}.
  1039 	By the rewriting steps described in lemma \ref{ntimesClosedFormsSteps}.
   881 \end{proof}
  1040 \end{proof}
   882 
  1041 \noindent
       
  1042 The key observation for bounding this closed form
       
  1043 is that the counter on $r^{\{n\}}$ will 
       
  1044 only decrement during derivatives:
   883 \begin{lemma}\label{nupdatesNLeqN}
  1045 \begin{lemma}\label{nupdatesNLeqN}
   884 	For an element $o$ in $\textit{set} \; (\nupdates \; s \; r \;
  1046 	For an element $o$ in $\textit{set} \; (\nupdates \; s \; r \;
   885 	[\Some \; ([c], n)])$, either $o = \None$, or $o = \Some
  1047 	[\Some \; ([c], n)])$, either $o = \None$, or $o = \Some
   886 	\; (s', m)$ for some string $s'$ and number $m \leq n$.
  1048 	\; (s', m)$ for some string $s'$ and number $m \leq n$.
   887 \end{lemma}
  1049 \end{lemma}
   888 
  1050 \noindent
   889 
  1051 The proof is routine and therefore omitted.
       
  1052 This allows us to say what kind of terms
       
  1053 are in the list $\textit{set} \; (\map \; (\optermsimp \; r) \; (
       
  1054 \nupdates \; s \; r \; [\Some \; ([c], n)]))$:
       
  1055 only $\ZERO_r$s or a sequence with the tail an $r^{\{m\}}$
       
  1056 with a small $m$:
   890 \begin{lemma}\label{ntimesClosedFormListElemShape}
  1057 \begin{lemma}\label{ntimesClosedFormListElemShape}
   891 	For any element $r'$ in $\textit{set} \; (\map \; (\optermsimp \; r) \; (
  1058 	For any element $r'$ in $\textit{set} \; (\map \; (\optermsimp \; r) \; (
   892 	\nupdates \; s \; r \; [\Some \; ([c], n)]))$,
  1059 	\nupdates \; s \; r \; [\Some \; ([c], n)]))$,
   893 	we have that $r'$ is either $\ZERO$ or $r \backslash_{rsimps} s' \cdot
  1060 	we have that $r'$ is either $\ZERO$ or $r \backslash_{rsimps} s' \cdot
   894 	r^{\{m\}}$ for some string $s'$ and number $m \leq n$.
  1061 	r^{\{m\}}$ for some string $s'$ and number $m \leq n$.
   916 In addition, we know that the list 
  1083 In addition, we know that the list 
   917 $\map \; (\optermsimp \; r) \; (
  1084 $\map \; (\optermsimp \; r) \; (
   918 \nupdates \; s \; r \; [\Some \; ([c], n)])$'s size is at most
  1085 \nupdates \; s \; r \; [\Some \; ([c], n)])$'s size is at most
   919 $c_N = \textit{card} \; 
  1086 $c_N = \textit{card} \; 
   920 (\sizeNregex \; ((N + \llbracket r^{\{n\}} \rrbracket) + 1))$.
  1087 (\sizeNregex \; ((N + \llbracket r^{\{n\}} \rrbracket) + 1))$.
   921 This gives us $\llbracket \llbracket r \backslash_{rsimps} \;s \llbracket_r
  1088 This gives us $\llbracket r \backslash_{rsimps} \;s \rrbracket_r
   922 \leq N * c_N$.
  1089 \leq N * c_N$.
   923 \end{proof}
  1090 \end{proof}
   924 
  1091 
   925 
  1092 We aim to formalise the correctness and size bound
   926 Assuming we have that
  1093 for constructs like $r^{\{\ldots n\}}$, $r^{\{n \ldots\}}$
   927 \begin{center}
  1094 and so on, which is still work in progress.
   928 	$\rderssimp{r^*}{s} = \rsimp{(\sum \map \; (\lambda s'. r\backslash s' \cdot r^*) \; Ss)}$.
  1095 They should more or less follow the same recipe described in this section.
   929 \end{center}
  1096 Once we know about how to deal with them recursively using suitable auxiliary
   930 holds.
  1097 definitions, we are able to routinely establish the proofs.
   931 The idea of $\starupdate$ and $\starupdates$
  1098 
   932 is to update $Ss$ when another
  1099 
   933 derivative is taken on $\rderssimp{r^*}{s}$
  1100 %The closed form for them looks like:
   934 w.r.t a character $c$ and a string $s'$
  1101 %%\begin{center}
   935 respectively.
  1102 %%	\begin{tabular}{llrclll}
   936 Both $\starupdate$ and $\starupdates$ take three arguments as input:
  1103 %%		$r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & \\
   937 the new character $c$ or string $s$ to take derivative with, 
  1104 %%		$\textit{rsimp}$ & $($ & $
   938 the regular expression
  1105 %%		\sum \; ( $ & $\map$ & $(\textit{optermsimp}\;r)$ & $($\\
   939 $r$ under the star $r^*$, and the
  1106 %%			 & & & & $\textit{nupdates} \;$ & 
   940 list of strings $Ss$ for the derivative $r^* \backslash s$ 
  1107 %%			 $ s \; r_0 \; [ \textit{Some} \; ([c], n)]$\\
   941 up until this point  
  1108 %%			 & & & & $)$ &\\
   942 such that 
  1109 %%			 & &  $)$ & & &\\
   943 \begin{center}
  1110 %%			 & $)$ & & & &\\
   944 $(r^*) \backslash s = \sum_{s' \in sSet} (r\backslash s') \cdot r^*$ 
  1111 %%	\end{tabular}
   945 \end{center}
  1112 %%\end{center}
   946 is satisfied.
       
   947 
       
   948 Functions $\starupdate$ and $\starupdates$ characterise what the 
       
   949 star derivatives will look like once ``straightened out'' into lists.
       
   950 The helper functions for such operations will be similar to
       
   951 $\sflat{\_}$, $\sflataux{\_}$ and $\sflataux{\_}$, which we defined for sequence.
       
   952 We use similar symbols to denote them, with a $*$ subscript to mark the difference.
       
   953 \begin{center}
       
   954 	\begin{tabular}{lcl}
       
   955 		$\hflataux{r_1 + r_2}$ & $\dn$ & $\hflataux{r_1} @ \hflataux{r_2}$\\
       
   956 		$\hflataux{r}$ & $\dn$ & $[r]$
       
   957 	\end{tabular}
       
   958 \end{center}
       
   959 
       
   960 \begin{center}
       
   961 	\begin{tabular}{lcl}
       
   962 		$\hflat{r_1 + r_2}$ & $\dn$ & $\sum (\hflataux {r_1} @ \hflataux {r_2}) $\\
       
   963 		$\hflat{r}$ & $\dn$ & $r$
       
   964 	\end{tabular}
       
   965 \end{center}
       
   966 \noindent
       
   967 These definitions are tailor-made for dealing with alternatives that have
       
   968 originated from a star's derivatives.
       
   969 A typical star derivative always has the structure of a balanced binary tree:
       
   970 \begin{center}
       
   971 	$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + 
       
   972 	(r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $ 
       
   973 \end{center}
       
   974 All of the nested structures of alternatives
       
   975 generated from derivatives are binary, and therefore
       
   976 $\hflat{\_}$ and $\hflataux{\_}$ only deal with binary alternatives.
       
   977 $\hflat{\_}$ ``untangles'' like the following:
       
   978 \[
       
   979 	\sum ((r_1 + r_2) + (r_3 + r_4))  + \ldots \;
       
   980 	\stackrel{\hflat{\_}}{\longrightarrow} \;
       
   981 	\RALTS{[r_1, r_2, \ldots, r_n]} 
       
   982 \]
       
   983 Here is a lemma stating the recursive property of $\starupdate$ and $\starupdates$,
       
   984 with the helpers $\hflat{\_}$ and $\hflataux{\_}$\footnote{The function $\textit{concat}$ takes a list of lists 
       
   985 			and merges each of the element lists to form a flattened list.}:
       
   986 \begin{lemma}\label{stupdateInduct1}
       
   987 	\mbox
       
   988 	For a list of strings $Ss$, the following hold.
       
   989 	\begin{itemize}
       
   990 		\item
       
   991 			If we do a derivative on the terms 
       
   992 			$r\backslash_r s \cdot r^*$ (where $s$ is taken from the list $Ss$),
       
   993 			the result will be the same as if we apply $\starupdate$ to $Ss$.
       
   994 			\begin{center}
       
   995 				\begin{tabular}{c}
       
   996 			$\textit{concat} \; (\map \; (\hflataux{\_} \circ ( (\_\backslash_r x)
       
   997 			\circ (\lambda s.\;\; (r \backslash_r s) \cdot r^*)))\; Ss )\;
       
   998 			$\\
       
   999 			\\
       
  1000 			$=$ \\
       
  1001 			\\
       
  1002 			$\map \; (\lambda s. (r \backslash_r s) \cdot (r^*)) \; 
       
  1003 			(\starupdate \; x \; r \; Ss)$
       
  1004 				\end{tabular}
       
  1005 			\end{center}
       
  1006 		\item
       
  1007 			$\starupdates$ is ``composable'' w.r.t a derivative.
       
  1008 			It piggybacks the character $x$ to the tail of the string
       
  1009 			$xs$.
       
  1010 			\begin{center}
       
  1011 				\begin{tabular}{c}
       
  1012 					$\textit{concat} \; (\map \; \hflataux{\_} \; 
       
  1013 					(\map \; (\_\backslash_r x) \; 
       
  1014 					(\map \; (\lambda s.\;\; (r \backslash_r s) \cdot 
       
  1015 					(r^*) ) \; (\starupdates \; xs \; r \; Ss))))$\\
       
  1016 					\\
       
  1017 					$=$\\
       
  1018 					\\
       
  1019 					$\map \; (\lambda s.\;\; (r\backslash_r s) \cdot (r^*)) \;
       
  1020 					(\starupdates \; (xs @ [x]) \; r \; Ss)$
       
  1021 				\end{tabular}
       
  1022 			\end{center}
       
  1023 	\end{itemize}
       
  1024 \end{lemma}
       
  1025 			
       
  1026 \begin{proof}
       
  1027 	Part 1 is by induction on $Ss$.
       
  1028 	Part 2 is by induction on $xs$, where $Ss$ is left to take arbitrary values.
       
  1029 \end{proof}
       
  1030 			
       
  1031 
       
  1032 Like $\textit{createdBySequence}$, we need
       
  1033 a predicate for ``star-created'' regular expressions:
       
  1034 \begin{center}
       
  1035 	\begin{mathpar}
       
  1036 		\inferrule{\mbox{}}{ \textit{createdByStar}\; \RSEQ{ra}{\RSTAR{rb}} }
       
  1037 
       
  1038 		\inferrule{  \textit{createdByStar} \; r_1\; \land  \; \textit{createdByStar} \; r_2 }{\textit{createdByStar} \; (r_1 + r_2) } 
       
  1039 	\end{mathpar}
       
  1040 \end{center}
       
  1041 \noindent
       
  1042 All regular expressions created by taking derivatives of
       
  1043 $r_1 \cdot (r_2)^*$ satisfy the $\textit{createdByStar}$ predicate:
       
  1044 \begin{lemma}\label{starDersCbs}
       
  1045 	$\textit{createdByStar} \; ((r_1 \cdot r_2^*) \backslash_r s) $ holds.
       
  1046 \end{lemma}
       
  1047 \begin{proof}
       
  1048 	By a reverse induction on $s$.
       
  1049 \end{proof}
       
  1050 If a regular expression conforms to the shape of a star's derivative,
       
  1051 then we can push an application of $\hflataux{\_}$ inside a derivative of it:
       
  1052 \begin{lemma}\label{hfauPushin}
       
  1053 	If $\textit{createdByStar} \; r$ holds, then
       
  1054 	\begin{center}
       
  1055 		$\hflataux{r \backslash_r c} = \textit{concat} \; (
       
  1056 		\map \; \hflataux{\_} (\map \; (\_\backslash_r c) \;(\hflataux{r})))$
       
  1057 	\end{center}
       
  1058 	holds.
       
  1059 \end{lemma}
       
  1060 \begin{proof}
       
  1061 	By an induction on the inductivev cases of $\textit{createdByStar}$.
       
  1062 \end{proof}
       
  1063 %This is not entirely true for annotated regular expressions: 
       
  1064 %%TODO: bsimp bders \neq bderssimp
       
  1065 %\begin{center}
  1113 %\begin{center}
  1066 %	$(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$
  1114 %	\begin{tabular}{llrcllrllll}
  1067 %\end{center}
  1115 %		$r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & &&&&\\
  1068 %For bit-codes, the order in which simplification is applied
  1116 %			      &&&&$\textit{rsimp}$ & $($ & $
  1069 %might cause a difference in the location they are placed.
  1117 %			      \sum \; ( $ & $\map$ & $(\textit{optermsimp}\;r)$ & $($\\
  1070 %If we want something like
  1118 %					  &&&& & & & & $\;\; \textit{nupdates} \;$ & 
  1071 %\begin{center}
  1119 %			 		  $ s \; r_0 \; [ \textit{Some} \; ([c], n)]$\\
  1072 %	$\bderssimp{r}{s} \myequiv \bsimp{\bders{r}{s}}$
  1120 %					  &&&& & & & & $)$ &\\
  1073 %\end{center}
  1121 %					  &&&& & &  $)$ & & &\\
  1074 %Some "canonicalization" procedure is required,
  1122 %					  &&&& & $)$ & & & &\\
  1075 %which either pushes all the common bitcodes to nodes
       
  1076 %as senior as possible:
       
  1077 %\begin{center}
       
  1078 %	$_{bs}(_{bs_1 @ bs'}r_1 + _{bs_1 @ bs''}r_2) \rightarrow _{bs @ bs_1}(_{bs'}r_1 + _{bs''}r_2) $
       
  1079 %\end{center}
       
  1080 %or does the reverse. However bitcodes are not of interest if we are talking about
       
  1081 %the $\llbracket r \rrbracket$ size of a regex.
       
  1082 %Therefore for the ease and simplicity of producing a
       
  1083 %proof for a size bound, we are happy to restrict ourselves to 
       
  1084 %unannotated regular expressions, and obtain such equalities as
       
  1085 %TODO: rsimp sflat
       
  1086 % The simplification of a flattened out regular expression, provided it comes
       
  1087 %from the derivative of a star, is the same as the one nested.
       
  1088 
       
  1089 
       
  1090 
       
  1091 Now we introduce an inductive property
       
  1092 for $\starupdate$ and $\hflataux{\_}$.
       
  1093 \begin{lemma}\label{starHfauInduct}
       
  1094 	If we do derivatives of $r^*$
       
  1095 	with a string that starts with $c$,
       
  1096 	then flatten it out,
       
  1097 	we obtain a list
       
  1098 	of the shape $\sum_{s' \in sS} (r\backslash_r s') \cdot r^*$,
       
  1099 	where $sS = \starupdates \; s \; r \; [[c]]$. Namely,
       
  1100 	\begin{center}
       
  1101 		$\hflataux{(\rders{( (\rder{c}{r_0})\cdot(r_0^*))}{s})} = 
       
  1102 		\map \; (\lambda s_1. (r_0 \backslash_r s_1) \cdot (r_0^*)) \; 
       
  1103 		(\starupdates \; s \; r_0 \; [[c]])$
       
  1104 	\end{center}
       
  1105 holds.
       
  1106 \end{lemma}
       
  1107 \begin{proof}
       
  1108 	By an induction on $s$, the inductive cases
       
  1109 	being $[]$ and $s@[c]$. The lemmas \ref{hfauPushin} and \ref{starDersCbs} are used.
       
  1110 \end{proof}
       
  1111 \noindent
       
  1112 
       
  1113 $\hflataux{\_}$ has a similar effect as $\textit{flatten}$:
       
  1114 \begin{lemma}\label{hflatauxGrewrites}
       
  1115 	$a :: rs \grewrites \hflataux{a} @ rs$
       
  1116 \end{lemma}
       
  1117 \begin{proof}
       
  1118 	By induction on $a$. $rs$ is set to take arbitrary values.
       
  1119 \end{proof}
       
  1120 It is also not surprising that $\textit{rsimp}$ will cover
       
  1121 the effect of $\hflataux{\_}$:
       
  1122 \begin{lemma}\label{cbsHfauRsimpeq1}
       
  1123 	$\rsimp{(r_1 + r_2)} = \rsimp{(\RALTS{\hflataux{r_1} @ \hflataux{r_2}})}$
       
  1124 \end{lemma}
       
  1125 
       
  1126 \begin{proof}
       
  1127 	By using the rewriting relation $\rightsquigarrow$
       
  1128 \end{proof}
       
  1129 And from this we obtain a proof that a star's derivative will be the same
       
  1130 as if it had all its nested alternatives created during deriving being flattened out:
       
  1131 For example,
       
  1132 \begin{lemma}\label{hfauRsimpeq2}
       
  1133 	$\createdByStar{r} \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$
       
  1134 \end{lemma}
       
  1135 \begin{proof}
       
  1136 	By structural induction on $r$, where the induction rules 
       
  1137 	are these of $\createdByStar{\_}$.
       
  1138 	Lemma \ref{cbsHfauRsimpeq1} is used in the inductive case.
       
  1139 \end{proof}
       
  1140 
       
  1141 
       
  1142 %Here is a corollary that states the lemma in
       
  1143 %a more intuitive way:
       
  1144 %\begin{corollary}
       
  1145 %	$\hflataux{r^* \backslash_r (c::xs)} = \map \; (\lambda s. (r \backslash_r s) \cdot
       
  1146 %	(r^*))\; (\starupdates \; c\; r\; [[c]])$
       
  1147 %\end{corollary}
       
  1148 %\noindent
       
  1149 %Note that this is also agnostic of the simplification
       
  1150 %function we defined, and is therefore of more general interest.
       
  1151 
       
  1152 Together with the rewriting relation
       
  1153 \begin{lemma}\label{starClosedForm6Hrewrites}
       
  1154 	We have the following set of rewriting relations or equalities:
       
  1155 	\begin{itemize}
       
  1156 		\item
       
  1157 			$\textit{rsimp} \; (r^* \backslash_r (c::s)) 
       
  1158 			\sequal
       
  1159 			\sum \; ( ( \sum (\lambda s. (r\backslash_r s) \cdot r^*) \; (
       
  1160 			\starupdates \; s \; r \; [ c::[]] ) ) )$
       
  1161 		\item
       
  1162 			$r \backslash_{rsimp} (c::s) = \textit{rsimp} \; ( (
       
  1163 			\sum ( (\map \; (\lambda s_1. (r\backslash s_1) \; r^*) \;
       
  1164 			(\starupdates \;s \; r \; [ c::[] ])))))$
       
  1165 		\item
       
  1166 			$\sum ( (\map \; (\lambda s. (r\backslash s) \; r^*) \; Ss))
       
  1167 			\sequal
       
  1168 			 \sum ( (\map \; (\lambda s. \textit{rsimp} \; (r\backslash s) \;
       
  1169 			 r^*) \; Ss) )$
       
  1170 		\item
       
  1171 			$\map \; (\lambda s. (\rsimp{r \backslash_r s}) \cdot (r^*)) \; Ss
       
  1172 			\scfrewrites
       
  1173 			\map \; (\lambda s. (\rsimp{r \backslash_r s}) \cdot (r^*)) \; Ss$
       
  1174 		\item
       
  1175 			$( ( \sum ( ( \map \ (\lambda s. \;\;
       
  1176 			(\textit{rsimp} \; (r \backslash_r s)) \cdot r^*) \; (\starupdates \;
       
  1177 			s \; r \; [ c::[] ])))))$\\
       
  1178 			$\sequal$\\
       
  1179 			$( ( \sum ( ( \map \ (\lambda s. \;\;
       
  1180 			( r \backslash_{rsimp} s)) \cdot r^*) \; (\starupdates \;
       
  1181 			s \; r \; [ c::[] ]))))$\\
       
  1182 	\end{itemize}
       
  1183 \end{lemma}
       
  1184 \begin{proof}
       
  1185 	Part 1 leads to part 2.
       
  1186 	The rest of them are routine.
       
  1187 \end{proof}
       
  1188 \noindent
       
  1189 Next the closed form for star regular expressions can be derived:
       
  1190 \begin{theorem}\label{starClosedForm}
       
  1191 	$\rderssimp{r^*}{c::s} = 
       
  1192 	\rsimp{
       
  1193 		(\sum (\map \; (\lambda s. (\rderssimp{r}{s})\cdot r^*) \; 
       
  1194 		(\starupdates \; s\; r \; [[c]])
       
  1195 		)
       
  1196 		)
       
  1197 	}
       
  1198 	$
       
  1199 \end{theorem}
       
  1200 \begin{proof}
       
  1201 	By an induction on $s$.
       
  1202 	The lemmas \ref{rsimpIdem}, \ref{starHfauInduct}, \ref{starClosedForm6Hrewrites} 
       
  1203 	and \ref{hfauRsimpeq2}
       
  1204 	are used.	
       
  1205 	In \ref{starClosedForm6Hrewrites}, the equalities are
       
  1206 	used to link the LHS and RHS.
       
  1207 \end{proof}
       
  1208 
       
  1209 
       
  1210 
       
  1211 
       
  1212 The closed form for them looks like:
       
  1213 %\begin{center}
       
  1214 %	\begin{tabular}{llrclll}
       
  1215 %		$r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & \\
       
  1216 %		$\textit{rsimp}$ & $($ & $
       
  1217 %		\sum \; ( $ & $\map$ & $(\textit{optermsimp}\;r)$ & $($\\
       
  1218 %			 & & & & $\textit{nupdates} \;$ & 
       
  1219 %			 $ s \; r_0 \; [ \textit{Some} \; ([c], n)]$\\
       
  1220 %			 & & & & $)$ &\\
       
  1221 %			 & &  $)$ & & &\\
       
  1222 %			 & $)$ & & & &\\
       
  1223 %	\end{tabular}
  1123 %	\end{tabular}
  1224 %\end{center}
  1124 %\end{center}
  1225 \begin{center}
  1125 %The $\textit{optermsimp}$ function with the argument $r$ 
  1226 	\begin{tabular}{llrcllrllll}
  1126 %chooses from two options: $\ZERO$ or 
  1227 		$r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & &&&&\\
  1127 %We define for the $r^{\{n\}}$ constructor something similar to $\starupdate$
  1228 			      &&&&$\textit{rsimp}$ & $($ & $
  1128 %and $\starupdates$:
  1229 			      \sum \; ( $ & $\map$ & $(\textit{optermsimp}\;r)$ & $($\\
  1129 %\begin{center}
  1230 					  &&&& & & & & $\;\; \textit{nupdates} \;$ & 
  1130 %	\begin{tabular}{lcl}
  1231 			 		  $ s \; r_0 \; [ \textit{Some} \; ([c], n)]$\\
  1131 %		$\starupdate \; c \; r \; [] $ & $\dn$ & $[]$\\
  1232 					  &&&& & & & & $)$ &\\
  1132 %		$\starupdate \; c \; r \; (s :: Ss)$ & $\dn$ & \\
  1233 					  &&&& & &  $)$ & & &\\
  1133 %						     & & $\textit{if} \; 
  1234 					  &&&& & $)$ & & & &\\
  1134 %						     (\rnullable \; (\rders \; r \; s))$ \\
  1235 	\end{tabular}
  1135 %						     & & $\textit{then} \;\; (s @ [c]) :: [c] :: (
  1236 \end{center}
  1136 %						     \starupdate \; c \; r \; Ss)$ \\
  1237 The $\textit{optermsimp}$ function with the argument $r$ 
  1137 %						     & & $\textit{else} \;\; (s @ [c]) :: (
  1238 chooses from two options: $\ZERO$ or 
  1138 %						     \starupdate \; c \; r \; Ss)$
  1239 We define for the $r^{\{n\}}$ constructor something similar to $\starupdate$
  1139 %	\end{tabular}
  1240 and $\starupdates$:
  1140 %\end{center}
  1241 \begin{center}
  1141 %\noindent
  1242 	\begin{tabular}{lcl}
  1142 %As a generalisation from characters to strings,
  1243 		$\starupdate \; c \; r \; [] $ & $\dn$ & $[]$\\
  1143 %$\starupdates$ takes a string instead of a character
  1244 		$\starupdate \; c \; r \; (s :: Ss)$ & $\dn$ & \\
  1144 %as the first input argument, and is otherwise the same
  1245 						     & & $\textit{if} \; 
  1145 %as $\starupdate$.
  1246 						     (\rnullable \; (\rders \; r \; s))$ \\
  1146 %\begin{center}
  1247 						     & & $\textit{then} \;\; (s @ [c]) :: [c] :: (
  1147 %	\begin{tabular}{lcl}
  1248 						     \starupdate \; c \; r \; Ss)$ \\
  1148 %		$\starupdates \; [] \; r \; Ss$ & $=$ & $Ss$\\
  1249 						     & & $\textit{else} \;\; (s @ [c]) :: (
  1149 %		$\starupdates \; (c :: cs) \; r \; Ss$ &  $=$ &  $\starupdates \; cs \; r \; (
  1250 						     \starupdate \; c \; r \; Ss)$
  1150 %		\starupdate \; c \; r \; Ss)$
  1251 	\end{tabular}
  1151 %	\end{tabular}
  1252 \end{center}
  1152 %\end{center}
  1253 \noindent
  1153 %\noindent
  1254 As a generalisation from characters to strings,
  1154 
  1255 $\starupdates$ takes a string instead of a character
  1155 
  1256 as the first input argument, and is otherwise the same
  1156 
  1257 as $\starupdate$.
  1157 %\section{Zippers}
  1258 \begin{center}
  1158 %Zipper is a data structure designed to operate on 
  1259 	\begin{tabular}{lcl}
  1159 %and navigate between local parts of a tree.
  1260 		$\starupdates \; [] \; r \; Ss$ & $=$ & $Ss$\\
  1160 %It was first formally described by Huet \cite{HuetZipper}.
  1261 		$\starupdates \; (c :: cs) \; r \; Ss$ &  $=$ &  $\starupdates \; cs \; r \; (
  1161 %Typical applications of zippers involve text editor buffers
  1262 		\starupdate \; c \; r \; Ss)$
  1162 %and proof system databases.
  1263 	\end{tabular}
  1163 %In our setting, the idea is to compactify the representation
  1264 \end{center}
  1164 %of derivatives with zippers, thereby making our algorithm faster.
  1265 \noindent
  1165 %Some initial results 
  1266 
  1166 %We first give a brief introduction to what zippers are,
  1267 
  1167 %and other works
       
  1168 %that apply zippers to derivatives
       
  1169 %When dealing with large trees, it would be a waste to 
       
  1170 %traverse the entire tree if
       
  1171 %the operation only
       
  1172 %involves a small fraction of it.
       
  1173 %The idea is to put the focus on that subtree, turning other parts
       
  1174 %of the tree into a context
       
  1175 %
       
  1176 %
       
  1177 %One observation about our derivative-based lexing algorithm is that
       
  1178 %the derivative operation sometimes traverses the entire regular expression
       
  1179 %unnecessarily:
  1268 
  1180 
  1269 
  1181 
  1270 %----------------------------------------------------------------------------------------
  1182 %----------------------------------------------------------------------------------------
  1271 %	SECTION 1
  1183 %	SECTION 1
  1272 %----------------------------------------------------------------------------------------
  1184 %----------------------------------------------------------------------------------------