153 For @{text "iffoo"} we obtain by the longest match rule a single identifier |
153 For @{text "iffoo"} we obtain by the longest match rule a single identifier |
154 token, not a keyword followed by an identifier. For @{text "if"} we obtain by |
154 token, not a keyword followed by an identifier. For @{text "if"} we obtain by |
155 the priority rule a keyword token, not an identifier token---even if @{text |
155 the priority rule a keyword token, not an identifier token---even if @{text |
156 "r\<^bsub>id\<^esub>"} matches also.\bigskip |
156 "r\<^bsub>id\<^esub>"} matches also.\bigskip |
157 |
157 |
158 \noindent {\bf Contributions:} (NOT DONE YET) We have implemented in |
158 \noindent {\bf Contributions:} We have implemented in Isabelle/HOL the |
159 Isabelle/HOL the derivative-based regular expression matching algorithm as |
159 derivative-based regular expression matching algorithm as described by |
160 described by Sulzmann and Lu \cite{Sulzmann2014}. We have proved the |
160 Sulzmann and Lu \cite{Sulzmann2014}. We have proved the correctness of this |
161 correctness of this algorithm according to our specification of what a POSIX |
161 algorithm according to our specification of what a POSIX value is. Sulzmann |
162 value is. Sulzmann and Lu sketch in \cite{Sulzmann2014} an informal |
162 and Lu sketch in \cite{Sulzmann2014} an informal correctness proof: but to |
163 correctness proof: but to us it contains unfillable gaps. |
163 us it contains unfillable gaps.\footnote{An extended version of |
164 |
164 \cite{Sulzmann2014} is available at the website of its first author; this |
165 informal correctness proof given in \cite{Sulzmann2014} is in final |
165 extended version already includes remarks in the appendix that their |
166 form\footnote{} and to us contains unfillable gaps. |
166 informal proof contains gaps, and possible fixes are not fully worked out.} |
167 |
|
168 Our specification of a POSIX value consists of a simple inductive definition |
167 Our specification of a POSIX value consists of a simple inductive definition |
169 that given a string and a regular expression uniquely determines this value. |
168 that given a string and a regular expression uniquely determines this value. |
170 Derivatives as calculated by Brzozowski's method are usually more complex |
169 Derivatives as calculated by Brzozowski's method are usually more complex |
171 regular expressions than the initial one; various optimisations are |
170 regular expressions than the initial one; various optimisations are |
172 possible, such as the simplifications of @{term "ALT ZERO r"}, @{term "ALT r |
171 possible. We prove the correctness when simplifications of @{term "ALT ZERO |
173 ZERO"}, @{term "SEQ ONE r"} and @{term "SEQ r ONE"} to @{term r}. One of the |
172 r"}, @{term "ALT r ZERO"}, @{term "SEQ ONE r"} and @{term "SEQ r ONE"} to |
174 advantages of having a simple specification and correctness proof is that |
173 @{term r} are applied. |
175 the latter can be refined to allow for such optimisations and simple |
174 |
176 correctness proof. |
175 %An extended version of \cite{Sulzmann2014} is available at the website of |
177 |
176 %its first author; this includes some ``proofs'', claimed in |
178 An extended version of \cite{Sulzmann2014} is available at the website of |
177 %\cite{Sulzmann2014} to be ``rigorous''. Since these are evidently not in |
179 its first author; this includes some ``proofs'', claimed in |
178 %final form, we make no comment thereon, preferring to give general reasons |
180 \cite{Sulzmann2014} to be ``rigorous''. Since these are evidently not in |
179 %for our belief that the approach of \cite{Sulzmann2014} is problematic |
181 final form, we make no comment thereon, preferring to give general reasons |
180 %rather than to discuss details of unpublished work. |
182 for our belief that the approach of \cite{Sulzmann2014} is problematic |
|
183 rather than to discuss details of unpublished work. |
|
184 |
181 |
185 *} |
182 *} |
186 |
183 |
187 section {* Preliminaries *} |
184 section {* Preliminaries *} |
188 |
185 |
269 @{thm (lhs) nullable.simps(2)} & $\dn$ & @{thm (rhs) nullable.simps(2)}\\ |
266 @{thm (lhs) nullable.simps(2)} & $\dn$ & @{thm (rhs) nullable.simps(2)}\\ |
270 @{thm (lhs) nullable.simps(3)} & $\dn$ & @{thm (rhs) nullable.simps(3)}\\ |
267 @{thm (lhs) nullable.simps(3)} & $\dn$ & @{thm (rhs) nullable.simps(3)}\\ |
271 @{thm (lhs) nullable.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) nullable.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\ |
268 @{thm (lhs) nullable.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) nullable.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\ |
272 @{thm (lhs) nullable.simps(5)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) nullable.simps(5)[of "r\<^sub>1" "r\<^sub>2"]}\\ |
269 @{thm (lhs) nullable.simps(5)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) nullable.simps(5)[of "r\<^sub>1" "r\<^sub>2"]}\\ |
273 @{thm (lhs) nullable.simps(6)} & $\dn$ & @{thm (rhs) nullable.simps(6)}\medskip\\ |
270 @{thm (lhs) nullable.simps(6)} & $\dn$ & @{thm (rhs) nullable.simps(6)}\medskip\\ |
|
271 \end{tabular} |
|
272 \end{center} |
|
273 |
|
274 \begin{center} |
|
275 \begin{tabular}{lcl} |
274 @{thm (lhs) der.simps(1)} & $\dn$ & @{thm (rhs) der.simps(1)}\\ |
276 @{thm (lhs) der.simps(1)} & $\dn$ & @{thm (rhs) der.simps(1)}\\ |
275 @{thm (lhs) der.simps(2)} & $\dn$ & @{thm (rhs) der.simps(2)}\\ |
277 @{thm (lhs) der.simps(2)} & $\dn$ & @{thm (rhs) der.simps(2)}\\ |
276 @{thm (lhs) der.simps(3)} & $\dn$ & @{thm (rhs) der.simps(3)}\\ |
278 @{thm (lhs) der.simps(3)} & $\dn$ & @{thm (rhs) der.simps(3)}\\ |
277 @{thm (lhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]}\\ |
279 @{thm (lhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]}\\ |
278 @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]}\\ |
280 @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]}\\ |
524 not allow is to build this constraint explicitly into our function |
526 not allow is to build this constraint explicitly into our function |
525 definition.\footnote{Sulzmann and Lu state this clause as @{thm (lhs) |
527 definition.\footnote{Sulzmann and Lu state this clause as @{thm (lhs) |
526 injval.simps(1)[of "c" "c"]} $\dn$ @{thm (rhs) injval.simps(1)[of "c"]}, |
528 injval.simps(1)[of "c" "c"]} $\dn$ @{thm (rhs) injval.simps(1)[of "c"]}, |
527 but our deviation is harmless.} |
529 but our deviation is harmless.} |
528 |
530 |
529 The idea of @{term inj} to ``inject back'' a character into a value can |
531 The idea of the @{term inj}-function to ``inject'' a character, say |
530 be made precise by the first part of the following lemma; the second |
532 @{term c}, into a value can be made precise by the first part of the |
531 part shows that the underlying string of an @{const mkeps}-value is |
533 following lemma, which shows that the underlying string of an injected |
532 always the empty string. |
534 value has a prepend character @{term c}; the second part shows that the |
|
535 underlying string of an @{const mkeps}-value is always the empty string |
|
536 (given the regular expression is nullable since otherwise @{text mkeps} |
|
537 might not be defined). |
533 |
538 |
534 \begin{lemma}\mbox{}\smallskip\\\label{Prf_injval_flat} |
539 \begin{lemma}\mbox{}\smallskip\\\label{Prf_injval_flat} |
535 \begin{tabular}{ll} |
540 \begin{tabular}{ll} |
536 (1) & @{thm[mode=IfThen] Prf_injval_flat}\\ |
541 (1) & @{thm[mode=IfThen] Prf_injval_flat}\\ |
537 (2) & @{thm[mode=IfThen] mkeps_flat} |
542 (2) & @{thm[mode=IfThen] mkeps_flat} |
538 \end{tabular} |
543 \end{tabular} |
539 \end{lemma} |
544 \end{lemma} |
540 |
545 |
541 \begin{proof} |
546 \begin{proof} |
542 Both properties are by routine inductions: the first one, for example, |
547 Both properties are by routine inductions: the first one can, for example, |
543 by an induction over the definition of @{term derivatives}; the second by |
548 be proved by an induction over the definition of @{term derivatives}; the second by |
544 induction on @{term r}. There are no interesting cases.\qed |
549 an induction on @{term r}. There are no interesting cases.\qed |
545 \end{proof} |
550 \end{proof} |
546 |
551 |
547 Having defined the @{const mkeps} and @{text inj} function we can extend |
552 Having defined the @{const mkeps} and @{text inj} function we can extend |
548 \Brz's matcher so that a [lexical] value is constructed (assuming the |
553 \Brz's matcher so that a [lexical] value is constructed (assuming the |
549 regular expression matches the string). The clauses of the lexer are |
554 regular expression matches the string). The clauses of the lexer are |
601 \end{tabular} |
606 \end{tabular} |
602 \end{center} |
607 \end{center} |
603 |
608 |
604 \noindent We claim that this relation captures the idea behind the two |
609 \noindent We claim that this relation captures the idea behind the two |
605 informal POSIX rules shown in the Introduction: Consider for example the |
610 informal POSIX rules shown in the Introduction: Consider for example the |
606 rules @{text "P+L"} and @{text "P+R"} where the POSIX value for an |
611 rules @{text "P+L"} and @{text "P+R"} where the POSIX value for a string |
607 alternative regular expression is specified---it is always a @{text |
612 and an alternative regular expression, that is @{term "(s, ALT r\<^sub>1 r\<^sub>2)"}, |
608 "Left"}-value, \emph{except} when the string to be matched is not in the |
613 is specified---it is always a @{text "Left"}-value, \emph{except} when the |
609 language of @{term "r\<^sub>1"}; only then it is a @{text Right}-value (see the |
614 string to be matched is not in the language of @{term "r\<^sub>1"}; only then it |
610 side-condition in @{text "P+R"}). Interesting is also the rule for |
615 is a @{text Right}-value (see the side-condition in @{text "P+R"}). |
611 sequence regular expressions (@{text "PS"}). The first two premises state |
616 Interesting is also the rule for sequence regular expressions (@{text |
612 that @{term "v\<^sub>1"} and @{term "v\<^sub>2"} are the POSIX values for |
617 "PS"}). The first two premises state that @{term "v\<^sub>1"} and @{term "v\<^sub>2"} |
613 @{term "(s\<^sub>1, r\<^sub>1)"} and @{term "(s\<^sub>2, r\<^sub>2)"} |
618 are the POSIX values for @{term "(s\<^sub>1, r\<^sub>1)"} and @{term "(s\<^sub>2, r\<^sub>2)"} |
614 respectively. Consider now the third premise and note that the POSIX value |
619 respectively. Consider now the third premise and note that the POSIX value |
615 of this rule should match the string @{term "s\<^sub>1 @ s\<^sub>2"}. According to the |
620 of this rule should match the string @{term "s\<^sub>1 @ s\<^sub>2"}. According to the |
616 longest match rule, we want that the @{term "s\<^sub>1"} is the longest initial |
621 longest match rule, we want that the @{term "s\<^sub>1"} is the longest initial |
617 split of @{term "s\<^sub>1 @ s\<^sub>2"} such that @{term "s\<^sub>2"} is still recognised |
622 split of @{term "s\<^sub>1 @ s\<^sub>2"} such that @{term "s\<^sub>2"} is still recognised |
618 by @{term "r\<^sub>2"}. Let us assume, contrary to the third premise, that there |
623 by @{term "r\<^sub>2"}. Let us assume, contrary to the third premise, that there |
619 \emph{exists} an @{term "s\<^sub>3"} and @{term "s\<^sub>4"} such that @{term "s\<^sub>2"} |
624 \emph{exist} an @{term "s\<^sub>3"} and @{term "s\<^sub>4"} such that @{term "s\<^sub>2"} |
620 can be split up into a non-empty @{term "s\<^sub>3"} and @{term "s\<^sub>4"}. Moreover |
625 can be split up into a non-empty string @{term "s\<^sub>3"} and possibly empty |
621 the longer @{term "s\<^sub>1 @ s\<^sub>3"} can be matched by @{text "r\<^sub>1"} and the |
626 string @{term "s\<^sub>4"}. Moreover the longer string @{term "s\<^sub>1 @ s\<^sub>3"} can be |
622 shorter @{term "s\<^sub>4"} can still be matched by @{term "r\<^sub>2"}. In this case |
627 matched by @{text "r\<^sub>1"} and the shorter @{term "s\<^sub>4"} can still be |
623 @{term "s\<^sub>1"} would not be the longest initial split of @{term "s\<^sub>1 @ |
628 matched by @{term "r\<^sub>2"}. In this case @{term "s\<^sub>1"} would not be the |
624 s\<^sub>2"} and therefore @{term "Seq v\<^sub>1 v\<^sub>2"} cannot be a POSIX value |
629 longest initial split of @{term "s\<^sub>1 @ s\<^sub>2"} and therefore @{term "Seq v\<^sub>1 |
625 for @{term "(s\<^sub>1 @ s\<^sub>2, SEQ r\<^sub>1 r\<^sub>2)"}. A similar condition is imposed |
630 v\<^sub>2"} cannot be a POSIX value for @{term "(s\<^sub>1 @ s\<^sub>2, SEQ r\<^sub>1 r\<^sub>2)"}. |
626 onto the POSIX value in the @{text "P\<star>"}-rule. Also there we want that |
631 The main point is that this side-condition ensures the longest |
627 @{term "s\<^sub>1"} is the longest initial split of @{term "s\<^sub>1 @ s\<^sub>2"} and |
632 match rule is satisfied. |
628 furthermore the corresponding value @{term v} cannot be flatten to |
633 |
629 the empty string. In effect, we require that in each ``iteration'' |
634 A similar condition is imposed on the POSIX value in the @{text |
630 of the star, some parts of the string need to be ``nibbled'' away; only |
635 "P\<star>"}-rule. Also there we want that @{term "s\<^sub>1"} is the longest initial |
631 in case of the empty string weBy accept @{term "Stars []"} as the |
636 split of @{term "s\<^sub>1 @ s\<^sub>2"} and furthermore the corresponding value |
632 POSIX value. |
637 @{term v} cannot be flatten to the empty string. In effect, we require |
|
638 that in each ``iteration'' of the star, some non-empty substring need to |
|
639 be ``chipped'' away; only in case of the empty string we accept @{term |
|
640 "Stars []"} as the POSIX value. |
633 |
641 |
634 We can prove that given a string @{term s} and regular expression @{term |
642 We can prove that given a string @{term s} and regular expression @{term |
635 r}, the POSIX value @{term v} is uniquely determined by @{term "s \<in> r \<rightarrow> |
643 r}, the POSIX value @{term v} is uniquely determined by @{term "s \<in> r \<rightarrow> |
636 v"}. |
644 v"} (albeilt in an uncomputable fashion---for example rule @{term "P+R"} |
|
645 would require the calculation of the potentially infinite set @{term "L |
|
646 r\<^sub>1"}). |
637 |
647 |
638 \begin{theorem} |
648 \begin{theorem} |
639 @{thm[mode=IfThen] PMatch_determ(1)[of _ _ "v\<^sub>1" "v\<^sub>2"]} |
649 @{thm[mode=IfThen] PMatch_determ(1)[of _ _ "v\<^sub>1" "v\<^sub>2"]} |
640 \end{theorem} |
650 \end{theorem} |
641 |
651 |
642 \begin{proof} |
652 \begin{proof} By induction on the definition of @{term "s \<in> r \<rightarrow> v\<^sub>1"} and |
643 By induction on the definition of @{term "s \<in> r \<rightarrow> v\<^sub>1"} and a case |
653 a case analysis of @{term "s \<in> r \<rightarrow> v\<^sub>2"}. This proof requires the |
644 analysis of @{term "s \<in> r \<rightarrow> v\<^sub>2"}.\qed |
654 auxiliary lemma that @{thm (prem 1) PMatch1(1)} implies @{thm (concl) |
645 \end{proof} |
655 PMatch1(1)} and @{thm (concl) PMatch1(2)}, which are both easily |
|
656 established by inductions.\qed \end{proof} |
|
657 |
|
658 \noindent |
|
659 Next is the lemma that shows the function @{term "mkeps"} calculates |
|
660 the posix value for the empty string and a nullable regular expression. |
646 |
661 |
647 \begin{lemma}\label{lemmkeps} |
662 \begin{lemma}\label{lemmkeps} |
648 @{thm[mode=IfThen] PMatch_mkeps} |
663 @{thm[mode=IfThen] PMatch_mkeps} |
649 \end{lemma} |
664 \end{lemma} |
650 |
665 |
708 holds. Putting this all together, we can conclude with @{term "(c # |
723 holds. Putting this all together, we can conclude with @{term "(c # |
709 s) \<in> SEQ r\<^sub>1 r\<^sub>2 \<rightarrow> Seq (mkeps r\<^sub>1) (injval r\<^sub>2 c v\<^sub>1)"}. |
724 s) \<in> SEQ r\<^sub>1 r\<^sub>2 \<rightarrow> Seq (mkeps r\<^sub>1) (injval r\<^sub>2 c v\<^sub>1)"}. |
710 |
725 |
711 Finally suppose @{term "r = STAR r\<^sub>1"}. This case is very similar to the |
726 Finally suppose @{term "r = STAR r\<^sub>1"}. This case is very similar to the |
712 sequence case, except that we need to ensure that @{term "flat (injval r\<^sub>1 |
727 sequence case, except that we need to ensure that @{term "flat (injval r\<^sub>1 |
713 c v\<^sub>1) \<noteq> []"}. This follows by Lem.~\ref{posixbasic} from @{term "(c # s\<^sub>1) |
728 c v\<^sub>1) \<noteq> []"}. This follows from @{term "(c # s\<^sub>1) |
714 \<in> r' \<rightarrow> injval r\<^sub>1 c v\<^sub>1"} (which in turn follows from @{term "s\<^sub>1 \<in> der c |
729 \<in> r' \<rightarrow> injval r\<^sub>1 c v\<^sub>1"} (which in turn follows from @{term "s\<^sub>1 \<in> der c |
715 r\<^sub>1 \<rightarrow> v\<^sub>1"} and the induction hypothesis).\qed |
730 r\<^sub>1 \<rightarrow> v\<^sub>1"} and the induction hypothesis).\qed |
716 \end{proof} |
731 \end{proof} |
717 |
732 |
718 \noindent |
733 \noindent |
719 With Lem.~\ref{PMatch2} in place, it is completely routine to establish |
734 With Lem.~\ref{PMatch2} in place, it is completely routine to establish |
720 that the Sulzmann and Lu lexer satisfies its specification (returning |
735 that the Sulzmann and Lu lexer satisfies our specification (returning |
721 an ``error'' iff the string is not in the language of the regular expression, |
736 an ``error'' iff the string is not in the language of the regular expression, |
722 and returning a unique POSIX value iff the string \emph{is} in the language): |
737 and returning a unique POSIX value iff the string \emph{is} in the language): |
723 |
738 |
724 \begin{theorem}\mbox{}\smallskip\\ |
739 \begin{theorem}\mbox{}\smallskip\\ |
725 \begin{tabular}{ll} |
740 \begin{tabular}{ll} |
727 (2) & @{thm (lhs) lex_correct3b} if and only if @{thm (rhs) lex_correct3b}\\ |
742 (2) & @{thm (lhs) lex_correct3b} if and only if @{thm (rhs) lex_correct3b}\\ |
728 \end{tabular} |
743 \end{tabular} |
729 \end{theorem} |
744 \end{theorem} |
730 |
745 |
731 \begin{proof} |
746 \begin{proof} |
732 By induction on @{term s}.\qed |
747 By induction on @{term s} using Lem.~\ref{lemmkeps} and \ref{PMatch2}.\qed |
733 \end{proof} |
748 \end{proof} |
734 |
749 |
735 This concludes our correctness proof. Note that we have not changed the |
750 \noindent This concludes our correctness proof. Note that we have not |
736 algorithm by Sulzmann and Lu, but introduced our own specification for |
751 changed the algorithm by Sulzmann and Lu, but introduced our own |
737 what a correct result---a POSIX value---should be. |
752 specification for what a correct result---a POSIX value---should be. |
|
753 A strong point in favour of Sulzmann and Lu's algorithm is that it |
|
754 can be extended in various ways. |
|
755 |
738 *} |
756 *} |
739 |
757 |
740 section {* Extensions *} |
758 section {* Extensions and Optimisations*} |
741 |
759 |
742 text {* |
760 text {* |
|
761 |
|
762 If we are interested in tokenising string, then we need to not just |
|
763 split up the string into tokens, but also ``classify'' the tokens (for |
|
764 example whether it is a keyword or an identifier). This can be |
|
765 done with only minor modifications by introducing \emph{record regular |
|
766 expressions} and \emph{record values} (for example \cite{Sulzmann2014b}): |
|
767 |
|
768 \begin{center} |
|
769 @{text "r :="} |
|
770 @{text "..."} $\mid$ |
|
771 @{text "(l : r)"} \qquad\qquad |
|
772 @{text "v :="} |
|
773 @{text "..."} $\mid$ |
|
774 @{text "(l : v)"} |
|
775 \end{center} |
|
776 |
|
777 \noindent where @{text l} is a label, say a string, @{text r} a regular |
|
778 expression and @{text v} a value. All functions can be smoothly extended |
|
779 to these regular expressions and values. For example @{text "(l : r)"} is |
|
780 nullable iff @{term r} is, and so on. The purpose of the record regular |
|
781 expression is to mark certain parts of a regular expression and then |
|
782 record in the calculated value which parts of the string were matched by |
|
783 this part. The label can then serve for classifying tokens. Recall the |
|
784 regular expression @{text "(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\<star>"} for keywords and |
|
785 identifiers from the Introduction. With record regular expression we can |
|
786 form @{text "((key : r\<^bsub>key\<^esub>) + (id : r\<^bsub>id\<^esub>))\<^sup>\<star>"} and then traverse the |
|
787 calculated value and only collect the underlying strings in record values. |
|
788 With this we obtain finite sequences of pairs of labels and strings, for |
|
789 example |
|
790 |
|
791 \[@{text "(l\<^sub>1 : s\<^sub>1), ..., (l\<^sub>n : s\<^sub>n)"}\] |
|
792 |
|
793 \noindent from which tokens with classifications (keyword-token, |
|
794 identifier-token and so on) can be extracted. |
743 |
795 |
744 Derivatives as calculated by \Brz's method are usually more complex |
796 Derivatives as calculated by \Brz's method are usually more complex |
745 regular expressions than the initial one; the result is that the matching |
797 regular expressions than the initial one; the result is that the matching |
746 and lexing algorithms are often absymally slow. |
798 and lexing algorithms are often abysmally slow. However, various |
747 |
799 optimisations are possible, such as the simplifications of @{term "ALT |
748 |
800 ZERO r"}, @{term "ALT r ZERO"}, @{term "SEQ ONE r"} and @{term "SEQ r |
749 various optimisations are |
801 ONE"} to @{term r}. One of the advantages of having a simple specification |
750 possible, such as the simplifications of @{term "ALT ZERO r"}, @{term "ALT |
802 and correctness proof is that the latter can be refined to allow for such |
751 r ZERO"}, @{term "SEQ ONE r"} and @{term "SEQ r ONE"} to @{term r}. One of |
803 optimisations and simple correctness proof. |
752 the advantages of having a simple specification and correctness proof is |
804 |
753 that the latter can be refined to allow for such optimisations and simple |
805 While the simplification of regular expressions according to |
754 correctness proof. |
806 simplification rules |
755 |
807 |
|
808 |
|
809 \noindent |
|
810 is well understood, there is a problem with the POSIX |
|
811 value calculation algorithm by Sulzmann and Lu: if we build a derivative |
|
812 regular expression and then simplify it, we will calculate a POSIX value |
|
813 for this simplified regular expression, \emph{not} for the original |
|
814 (unsimplified) derivative regular expression. Sulzmann and Lu overcome |
|
815 this problem by not just calculating a simplified regular expression, but |
|
816 also calculating a \emph{rectification function} that ``repairs'' the |
|
817 incorrect value. |
756 |
818 |
757 *} |
819 *} |
758 |
820 |
759 section {* The Argument by Sulzmmann and Lu *} |
821 section {* The Argument by Sulzmmann and Lu *} |
760 |
822 |