495 $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$, |
495 $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$, |
496 |
496 |
497 |
497 |
498 |
498 |
499 |
499 |
500 \section{The NTIMES Constructor, and the Size Bound and Correctness Proof for it} |
|
501 The NTIMES construct has the following closed form: |
|
502 \begin{verbatim} |
|
503 "rders_simp (RNTIMES r0 (Suc n)) (c#s) = |
|
504 rsimp ( RALTS ( (map (optermsimp r0 ) (nupdates s r0 [Some ([c], n)]) ) ))" |
|
505 \end{verbatim} |
|
506 %---------------------------------------------------------------------------------------- |
|
507 % SECTION 1 |
|
508 %---------------------------------------------------------------------------------------- |
|
509 |
|
510 \section{Adding Support for the Negation Construct, and its Correctness Proof} |
|
511 We now add support for the negation regular expression: |
|
512 \[ r ::= \ZERO \mid \ONE |
|
513 \mid c |
|
514 \mid r_1 \cdot r_2 |
|
515 \mid r_1 + r_2 |
|
516 \mid r^* |
|
517 \mid \sim r |
|
518 \] |
|
519 The $\textit{nullable}$ function's clause for it would be |
|
520 \[ |
|
521 \textit{nullable}(~r) = \neg \nullable(r) |
|
522 \] |
|
523 The derivative would be |
|
524 \[ |
|
525 ~r \backslash c = ~ (r \backslash c) |
|
526 \] |
|
527 |
|
528 The most tricky part of lexing for the $~r$ regular expression |
|
529 is creating a value for it. |
|
530 For other regular expressions, the value aligns with the |
|
531 structure of the regular expression: |
|
532 \[ |
|
533 \vdash \Seq(\Char(a), \Char(b)) : a \cdot b |
|
534 \] |
|
535 But for the $~r$ regular expression, $s$ is a member of it if and only if |
|
536 $s$ does not belong to $L(r)$. |
|
537 That means when there |
|
538 is a match for the not regular expression, it is not possible to generate how the string $s$ matched |
|
539 with $r$. |
|
540 What we can do is preserve the information of how $s$ was not matched by $r$, |
|
541 and there are a number of options to do this. |
|
542 |
|
543 We could give a partial value when there is a partial match for the regular expression inside |
|
544 the $\mathbf{not}$ construct. |
|
545 For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$, |
|
546 A value for it could be |
|
547 \[ |
|
548 \vdash \textit{Not}(\Seq(\Char(a), \Char(b))) : ~((a \cdot b ) \cdot c) |
|
549 \] |
|
550 The above example demonstrates what value to construct |
|
551 when the string $s$ is at most a real prefix |
|
552 of the strings in $L(r)$. When $s$ instead is not a prefix of any strings |
|
553 in $L(r)$, it becomes unclear what to return as a value inside the $\textit{Not}$ |
|
554 constructor. |
|
555 |
|
556 Another option would be to either store the string $s$ that resulted in |
|
557 a mis-match for $r$ or a dummy value as a placeholder: |
|
558 \[ |
|
559 \vdash \textit{Not}(abcd) : ~( r_1 ) |
|
560 \] |
|
561 or |
|
562 \[ |
|
563 \vdash \textit{Not}(\textit{Dummy}) : ~( r_1 ) |
|
564 \] |
|
565 We choose to implement this as it is most straightforward: |
|
566 \[ |
|
567 \mkeps(~(r)) = \textit{if}(\nullable(r)) \; \textit{Error} \; \textit{else} \; \textit{Not}(\textit{Dummy}) |
|
568 \] |
|
569 |
|
570 %---------------------------------------------------------------------------------------- |
500 %---------------------------------------------------------------------------------------- |
571 % SECTION 2 |
501 % SECTION 2 |
572 %---------------------------------------------------------------------------------------- |
502 %---------------------------------------------------------------------------------------- |
573 |
503 |
574 \section{Bounded Repetitions} |
504 \section{Bounded Repetitions} |
575 We define for the $r^{n}$ constructor something similar to $\starupdate$ |
505 We have promised in chapter \ref{Introduction} |
|
506 that our lexing algorithm can potentially be extended |
|
507 to handle bounded repetitions |
|
508 in natural and elegant ways. |
|
509 Now we fulfill our promise by adding support for |
|
510 the ``exactly-$n$-times'' bounded regular expression $r^{\{n\}}$. |
|
511 We add clauses in our derivatives-based lexing algorithms (with simplifications) |
|
512 introduced in chapter \ref{Bitcoded2}. |
|
513 |
|
514 \subsection{Augmented Definitions} |
|
515 There are a number of definitions that needs to be augmented. |
|
516 The most notable one would be the POSIX rules for $r^{\{n\}}$: |
|
517 \begin{mathpar} |
|
518 \inferrule{v \in vs}{\textit{Stars} place holder} |
|
519 \end{mathpar} |
|
520 |
|
521 |
|
522 \subsection{Proofs for the Augmented Lexing Algorithm} |
|
523 We need to maintain two proofs with the additional $r^{\{n\}}$ |
|
524 construct: the |
|
525 correctness proof in chapter \ref{Bitcoded2}, |
|
526 and the finiteness proof in chapter \ref{Finite}. |
|
527 |
|
528 \subsubsection{Correctness Proof Augmentation} |
|
529 The correctness of $\textit{lexer}$ and $\textit{blexer}$ with bounded repetitions |
|
530 have been proven by Ausaf and Urban\cite{AusafDyckhoffUrban2016}. |
|
531 As they have commented, once the definitions are in place, |
|
532 the proofs given for the basic regular expressions will extend to |
|
533 bounded regular expressions, and there are no ``surprises''. |
|
534 We confirm this point because the correctness theorem would also |
|
535 extend without surprise to $\blexersimp$. |
|
536 The rewrite rules such as $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$ and so on |
|
537 do not need to be changed, |
|
538 and only a few lemmas such as lemma \ref{fltsPreserves} need to be adjusted to |
|
539 add one more line which can be solved by sledgehammer |
|
540 to solve the $r^{\{n\}}$ inductive case. |
|
541 |
|
542 |
|
543 \subsubsection{Finiteness Proof Augmentation} |
|
544 The bounded repetitions can be handled |
|
545 using techniques similar to stars. |
|
546 The closed form for them looks like: |
|
547 %\begin{center} |
|
548 % \begin{tabular}{llrclll} |
|
549 % $r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & \\ |
|
550 % $\textit{rsimp}$ & $($ & $ |
|
551 % \sum \; ( $ & $\map$ & $(\textit{optermsimp}\;r)$ & $($\\ |
|
552 % & & & & $\textit{nupdates} \;$ & |
|
553 % $ s \; r_0 \; [ \textit{Some} \; ([c], n)]$\\ |
|
554 % & & & & $)$ &\\ |
|
555 % & & $)$ & & &\\ |
|
556 % & $)$ & & & &\\ |
|
557 % \end{tabular} |
|
558 %\end{center} |
|
559 \begin{center} |
|
560 \begin{tabular}{llrcllrllll} |
|
561 $r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & &&&&\\ |
|
562 &&&&$\textit{rsimp}$ & $($ & $ |
|
563 \sum \; ( $ & $\map$ & $(\textit{optermsimp}\;r)$ & $($\\ |
|
564 &&&& & & & & $\;\; \textit{nupdates} \;$ & |
|
565 $ s \; r_0 \; [ \textit{Some} \; ([c], n)]$\\ |
|
566 &&&& & & & & $)$ &\\ |
|
567 &&&& & & $)$ & & &\\ |
|
568 &&&& & $)$ & & & &\\ |
|
569 \end{tabular} |
|
570 \end{center} |
|
571 We define for the $r^{\{n\}}$ constructor something similar to $\starupdate$ |
576 and $\starupdates$: |
572 and $\starupdates$: |
577 \begin{center} |
573 \begin{center} |
578 \begin{tabular}{lcl} |
574 \begin{tabular}{lcl} |
579 $\starupdate \; c \; r \; [] $ & $\dn$ & $[]$\\ |
575 $\starupdate \; c \; r \; [] $ & $\dn$ & $[]$\\ |
580 $\starupdate \; c \; r \; (s :: Ss)$ & $\dn$ & \\ |
576 $\starupdate \; c \; r \; (s :: Ss)$ & $\dn$ & \\ |
598 \starupdate \; c \; r \; Ss)$ |
594 \starupdate \; c \; r \; Ss)$ |
599 \end{tabular} |
595 \end{tabular} |
600 \end{center} |
596 \end{center} |
601 \noindent |
597 \noindent |
602 |
598 |
603 |
599 %---------------------------------------------------------------------------------------- |
|
600 % SECTION 1 |
|
601 %---------------------------------------------------------------------------------------- |
|
602 |
|
603 %\section{Adding Support for the Negation Construct, and its Correctness Proof} |
|
604 %We now add support for the negation regular expression: |
|
605 %\[ r ::= \ZERO \mid \ONE |
|
606 % \mid c |
|
607 % \mid r_1 \cdot r_2 |
|
608 % \mid r_1 + r_2 |
|
609 % \mid r^* |
|
610 % \mid \sim r |
|
611 %\] |
|
612 %The $\textit{nullable}$ function's clause for it would be |
|
613 %\[ |
|
614 %\textit{nullable}(~r) = \neg \nullable(r) |
|
615 %\] |
|
616 %The derivative would be |
|
617 %\[ |
|
618 %~r \backslash c = ~ (r \backslash c) |
|
619 %\] |
|
620 % |
|
621 %The most tricky part of lexing for the $~r$ regular expression |
|
622 % is creating a value for it. |
|
623 % For other regular expressions, the value aligns with the |
|
624 % structure of the regular expression: |
|
625 % \[ |
|
626 % \vdash \Seq(\Char(a), \Char(b)) : a \cdot b |
|
627 % \] |
|
628 %But for the $~r$ regular expression, $s$ is a member of it if and only if |
|
629 %$s$ does not belong to $L(r)$. |
|
630 %That means when there |
|
631 %is a match for the not regular expression, it is not possible to generate how the string $s$ matched |
|
632 %with $r$. |
|
633 %What we can do is preserve the information of how $s$ was not matched by $r$, |
|
634 %and there are a number of options to do this. |
|
635 % |
|
636 %We could give a partial value when there is a partial match for the regular expression inside |
|
637 %the $\mathbf{not}$ construct. |
|
638 %For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$, |
|
639 %A value for it could be |
|
640 % \[ |
|
641 % \vdash \textit{Not}(\Seq(\Char(a), \Char(b))) : ~((a \cdot b ) \cdot c) |
|
642 % \] |
|
643 % The above example demonstrates what value to construct |
|
644 % when the string $s$ is at most a real prefix |
|
645 % of the strings in $L(r)$. When $s$ instead is not a prefix of any strings |
|
646 % in $L(r)$, it becomes unclear what to return as a value inside the $\textit{Not}$ |
|
647 % constructor. |
|
648 % |
|
649 % Another option would be to either store the string $s$ that resulted in |
|
650 % a mis-match for $r$ or a dummy value as a placeholder: |
|
651 % \[ |
|
652 % \vdash \textit{Not}(abcd) : ~( r_1 ) |
|
653 % \] |
|
654 %or |
|
655 % \[ |
|
656 % \vdash \textit{Not}(\textit{Dummy}) : ~( r_1 ) |
|
657 % \] |
|
658 % We choose to implement this as it is most straightforward: |
|
659 % \[ |
|
660 % \mkeps(~(r)) = \textit{if}(\nullable(r)) \; \textit{Error} \; \textit{else} \; \textit{Not}(\textit{Dummy}) |
|
661 % \] |
|
662 % |
|
663 % |