ChengsongTanPhdThesis/Chapters/Cubic.tex
changeset 612 8c234a1bc7e0
parent 596 b306628a0eab
child 613 b0f0d884a547
equal deleted inserted replaced
611:bc1df466150a 612:8c234a1bc7e0
   495  $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$,
   495  $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$,
   496 
   496 
   497 
   497 
   498 
   498 
   499 
   499 
   500 \section{The NTIMES Constructor, and the Size Bound and Correctness Proof for it}
       
   501 The NTIMES construct has the following closed form:
       
   502 \begin{verbatim}
       
   503 "rders_simp (RNTIMES r0 (Suc n)) (c#s) = 
       
   504 rsimp ( RALTS ( (map (optermsimp r0 ) (nupdates s r0 [Some ([c], n)]) ) ))"
       
   505 \end{verbatim}
       
   506 %----------------------------------------------------------------------------------------
       
   507 %	SECTION 1
       
   508 %----------------------------------------------------------------------------------------
       
   509 
       
   510 \section{Adding Support for the Negation Construct, and its Correctness Proof}
       
   511 We now add support for the negation regular expression:
       
   512 \[			r ::=   \ZERO \mid  \ONE
       
   513 			 \mid  c  
       
   514 			 \mid  r_1 \cdot r_2
       
   515 			 \mid  r_1 + r_2   
       
   516 			 \mid r^*    
       
   517 			 \mid \sim r
       
   518 \]
       
   519 The $\textit{nullable}$ function's clause for it would be 
       
   520 \[
       
   521 \textit{nullable}(~r) = \neg \nullable(r)
       
   522 \]
       
   523 The derivative would be
       
   524 \[
       
   525 ~r \backslash c = ~ (r \backslash c)
       
   526 \]
       
   527  
       
   528 The most tricky part of lexing for the $~r$ regular expression
       
   529  is creating a value for it.
       
   530  For other regular expressions, the value aligns with the 
       
   531  structure of the regular expression:
       
   532  \[
       
   533  \vdash \Seq(\Char(a), \Char(b)) : a \cdot b
       
   534  \]
       
   535 But for the $~r$ regular expression, $s$ is a member of it if and only if
       
   536 $s$ does not belong to $L(r)$. 
       
   537 That means when there
       
   538 is a match for the not regular expression, it is not possible to generate how the string $s$ matched
       
   539 with $r$. 
       
   540 What we can do is preserve the information of how $s$ was not matched by $r$,
       
   541 and there are a number of options to do this.
       
   542 
       
   543 We could give a partial value when there is a partial match for the regular expression inside
       
   544 the $\mathbf{not}$ construct.
       
   545 For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$,
       
   546 A value for it could be 
       
   547  \[
       
   548  \vdash \textit{Not}(\Seq(\Char(a), \Char(b))) : ~((a \cdot b ) \cdot c)
       
   549  \]
       
   550  The above example demonstrates what value to construct 
       
   551  when the string $s$ is at most a real prefix
       
   552  of the strings in $L(r)$. When $s$ instead is not a prefix of any strings
       
   553  in $L(r)$, it becomes unclear what to return as a value inside the $\textit{Not}$
       
   554  constructor.
       
   555  
       
   556  Another option would be to either store the string $s$ that resulted in 
       
   557  a mis-match for $r$ or a dummy value as a placeholder:
       
   558  \[
       
   559  \vdash \textit{Not}(abcd) : ~( r_1 )
       
   560  \]
       
   561 or
       
   562  \[
       
   563  \vdash \textit{Not}(\textit{Dummy}) : ~( r_1 )
       
   564  \] 
       
   565  We choose to implement this as it is most straightforward:
       
   566  \[
       
   567  \mkeps(~(r))  = \textit{if}(\nullable(r)) \; \textit{Error} \; \textit{else} \; \textit{Not}(\textit{Dummy})
       
   568  \]
       
   569  
       
   570 %----------------------------------------------------------------------------------------
   500 %----------------------------------------------------------------------------------------
   571 %	SECTION 2
   501 %	SECTION 2
   572 %----------------------------------------------------------------------------------------
   502 %----------------------------------------------------------------------------------------
   573 
   503 
   574 \section{Bounded Repetitions}
   504 \section{Bounded Repetitions}
   575 We define for the $r^{n}$ constructor something similar to $\starupdate$
   505 We have promised in chapter \ref{Introduction}
       
   506 that our lexing algorithm can potentially be extended
       
   507 to handle bounded repetitions
       
   508 in natural and elegant ways.
       
   509 Now we fulfill our promise by adding support for 
       
   510 the ``exactly-$n$-times'' bounded regular expression $r^{\{n\}}$.
       
   511 We add clauses in our derivatives-based lexing algorithms (with simplifications)
       
   512 introduced in chapter \ref{Bitcoded2}.
       
   513 
       
   514 \subsection{Augmented Definitions}
       
   515 There are a number of definitions that needs to be augmented.
       
   516 The most notable one would be the POSIX rules for $r^{\{n\}}$:
       
   517 \begin{mathpar}
       
   518 	\inferrule{v \in vs}{\textit{Stars} place holder}
       
   519 \end{mathpar}
       
   520 
       
   521 
       
   522 \subsection{Proofs for the Augmented Lexing Algorithm}
       
   523 We need to maintain two proofs with the additional $r^{\{n\}}$
       
   524 construct: the 
       
   525 correctness proof in chapter \ref{Bitcoded2},
       
   526 and the finiteness proof in chapter \ref{Finite}.
       
   527 
       
   528 \subsubsection{Correctness Proof Augmentation}
       
   529 The correctness of $\textit{lexer}$ and $\textit{blexer}$ with bounded repetitions
       
   530 have been proven by Ausaf and Urban\cite{AusafDyckhoffUrban2016}.
       
   531 As they have commented, once the definitions are in place,
       
   532 the proofs given for the basic regular expressions will extend to
       
   533 bounded regular expressions, and there are no ``surprises''.
       
   534 We confirm this point because the correctness theorem would also
       
   535 extend without surprise to $\blexersimp$.
       
   536 The rewrite rules such as $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$ and so on
       
   537 do not need to be changed,
       
   538 and only a few lemmas such as lemma \ref{fltsPreserves} need to be adjusted to 
       
   539 add one more line which can be solved by sledgehammer 
       
   540 to solve the $r^{\{n\}}$ inductive case.
       
   541 
       
   542 
       
   543 \subsubsection{Finiteness Proof Augmentation}
       
   544 The bounded repetitions can be handled
       
   545 using techniques similar to stars.
       
   546 The closed form for them looks like:
       
   547 %\begin{center}
       
   548 %	\begin{tabular}{llrclll}
       
   549 %		$r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & \\
       
   550 %		$\textit{rsimp}$ & $($ & $
       
   551 %		\sum \; ( $ & $\map$ & $(\textit{optermsimp}\;r)$ & $($\\
       
   552 %			 & & & & $\textit{nupdates} \;$ & 
       
   553 %			 $ s \; r_0 \; [ \textit{Some} \; ([c], n)]$\\
       
   554 %			 & & & & $)$ &\\
       
   555 %			 & &  $)$ & & &\\
       
   556 %			 & $)$ & & & &\\
       
   557 %	\end{tabular}
       
   558 %\end{center}
       
   559 \begin{center}
       
   560 	\begin{tabular}{llrcllrllll}
       
   561 		$r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & &&&&\\
       
   562 			      &&&&$\textit{rsimp}$ & $($ & $
       
   563 			      \sum \; ( $ & $\map$ & $(\textit{optermsimp}\;r)$ & $($\\
       
   564 					  &&&& & & & & $\;\; \textit{nupdates} \;$ & 
       
   565 			 		  $ s \; r_0 \; [ \textit{Some} \; ([c], n)]$\\
       
   566 					  &&&& & & & & $)$ &\\
       
   567 					  &&&& & &  $)$ & & &\\
       
   568 					  &&&& & $)$ & & & &\\
       
   569 	\end{tabular}
       
   570 \end{center}
       
   571 We define for the $r^{\{n\}}$ constructor something similar to $\starupdate$
   576 and $\starupdates$:
   572 and $\starupdates$:
   577 \begin{center}
   573 \begin{center}
   578 	\begin{tabular}{lcl}
   574 	\begin{tabular}{lcl}
   579 		$\starupdate \; c \; r \; [] $ & $\dn$ & $[]$\\
   575 		$\starupdate \; c \; r \; [] $ & $\dn$ & $[]$\\
   580 		$\starupdate \; c \; r \; (s :: Ss)$ & $\dn$ & \\
   576 		$\starupdate \; c \; r \; (s :: Ss)$ & $\dn$ & \\
   598 		\starupdate \; c \; r \; Ss)$
   594 		\starupdate \; c \; r \; Ss)$
   599 	\end{tabular}
   595 	\end{tabular}
   600 \end{center}
   596 \end{center}
   601 \noindent
   597 \noindent
   602 
   598 
   603 
   599 %----------------------------------------------------------------------------------------
       
   600 %	SECTION 1
       
   601 %----------------------------------------------------------------------------------------
       
   602 
       
   603 %\section{Adding Support for the Negation Construct, and its Correctness Proof}
       
   604 %We now add support for the negation regular expression:
       
   605 %\[			r ::=   \ZERO \mid  \ONE
       
   606 %			 \mid  c  
       
   607 %			 \mid  r_1 \cdot r_2
       
   608 %			 \mid  r_1 + r_2   
       
   609 %			 \mid r^*    
       
   610 %			 \mid \sim r
       
   611 %\]
       
   612 %The $\textit{nullable}$ function's clause for it would be 
       
   613 %\[
       
   614 %\textit{nullable}(~r) = \neg \nullable(r)
       
   615 %\]
       
   616 %The derivative would be
       
   617 %\[
       
   618 %~r \backslash c = ~ (r \backslash c)
       
   619 %\]
       
   620 % 
       
   621 %The most tricky part of lexing for the $~r$ regular expression
       
   622 % is creating a value for it.
       
   623 % For other regular expressions, the value aligns with the 
       
   624 % structure of the regular expression:
       
   625 % \[
       
   626 % \vdash \Seq(\Char(a), \Char(b)) : a \cdot b
       
   627 % \]
       
   628 %But for the $~r$ regular expression, $s$ is a member of it if and only if
       
   629 %$s$ does not belong to $L(r)$. 
       
   630 %That means when there
       
   631 %is a match for the not regular expression, it is not possible to generate how the string $s$ matched
       
   632 %with $r$. 
       
   633 %What we can do is preserve the information of how $s$ was not matched by $r$,
       
   634 %and there are a number of options to do this.
       
   635 %
       
   636 %We could give a partial value when there is a partial match for the regular expression inside
       
   637 %the $\mathbf{not}$ construct.
       
   638 %For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$,
       
   639 %A value for it could be 
       
   640 % \[
       
   641 % \vdash \textit{Not}(\Seq(\Char(a), \Char(b))) : ~((a \cdot b ) \cdot c)
       
   642 % \]
       
   643 % The above example demonstrates what value to construct 
       
   644 % when the string $s$ is at most a real prefix
       
   645 % of the strings in $L(r)$. When $s$ instead is not a prefix of any strings
       
   646 % in $L(r)$, it becomes unclear what to return as a value inside the $\textit{Not}$
       
   647 % constructor.
       
   648 % 
       
   649 % Another option would be to either store the string $s$ that resulted in 
       
   650 % a mis-match for $r$ or a dummy value as a placeholder:
       
   651 % \[
       
   652 % \vdash \textit{Not}(abcd) : ~( r_1 )
       
   653 % \]
       
   654 %or
       
   655 % \[
       
   656 % \vdash \textit{Not}(\textit{Dummy}) : ~( r_1 )
       
   657 % \] 
       
   658 % We choose to implement this as it is most straightforward:
       
   659 % \[
       
   660 % \mkeps(~(r))  = \textit{if}(\nullable(r)) \; \textit{Error} \; \textit{else} \; \textit{Not}(\textit{Dummy})
       
   661 % \]
       
   662 % 
       
   663 %