60 ValOrdEq ("_ \<ge>\<^bsub>_\<^esub> _" [77,77,77] 77) |
60 ValOrdEq ("_ \<ge>\<^bsub>_\<^esub> _" [77,77,77] 77) |
61 |
61 |
62 definition |
62 definition |
63 "match r s \<equiv> nullable (ders s r)" |
63 "match r s \<equiv> nullable (ders s r)" |
64 |
64 |
|
65 (* |
|
66 comments not implemented |
|
67 |
|
68 p9. "None is returned, indicating an error is raised" : there is no error |
|
69 |
|
70 p9. The condtion "not exists s3 s4..." appears often enough (in particular in |
|
71 the proof of Lemma 3) to warrant a definition. |
|
72 |
|
73 p9. "The POSIX value v" : at this point the existence of a POSIX value is yet |
|
74 to be shown |
|
75 |
|
76 p10. (proof Lemma 3) : separating the cases with description/itemize would greatly |
|
77 improve readability |
|
78 |
|
79 p11. Theorem 2(2) : Stressing the uniqueness is strange, given that it follows |
|
80 trivially from the fact that "lexer" is a function. Maybe state the existence of |
|
81 a unique POSIX value as corollary. |
|
82 |
|
83 p14. Proposition 3 is evidently false and Proposition 4 is a conjecture. |
|
84 I find it confusing that Propositions 1 and 2, which are proven facts, are also called propositions. |
|
85 |
|
86 *) |
|
87 |
65 (*>*) |
88 (*>*) |
66 |
89 |
67 section {* Introduction *} |
90 section {* Introduction *} |
68 |
91 |
69 |
92 |
412 @{term b} and @{term c}). We then use @{const nullable} to find out |
435 @{term b} and @{term c}). We then use @{const nullable} to find out |
413 whether the resulting derivative regular expression @{term "r\<^sub>4"} |
436 whether the resulting derivative regular expression @{term "r\<^sub>4"} |
414 can match the empty string. If yes, we call the function @{const mkeps} |
437 can match the empty string. If yes, we call the function @{const mkeps} |
415 that produces a value @{term "v\<^sub>4"} for how @{term "r\<^sub>4"} can |
438 that produces a value @{term "v\<^sub>4"} for how @{term "r\<^sub>4"} can |
416 match the empty string (taking into account the POSIX constraints in case |
439 match the empty string (taking into account the POSIX constraints in case |
417 there are several ways). This functions is defined by the clauses: |
440 there are several ways). This function is defined by the clauses: |
418 |
441 |
419 \begin{figure}[t] |
442 \begin{figure}[t] |
420 \begin{center} |
443 \begin{center} |
421 \begin{tikzpicture}[scale=2,node distance=1.3cm, |
444 \begin{tikzpicture}[scale=2,node distance=1.3cm, |
422 every node/.style={minimum size=7mm}] |
445 every node/.style={minimum size=7mm}] |
517 @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} $\dn$ @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} |
540 @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} $\dn$ @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} |
518 \end{center} |
541 \end{center} |
519 |
542 |
520 \noindent Consider first the @{text "else"}-branch where the derivative is @{term |
543 \noindent Consider first the @{text "else"}-branch where the derivative is @{term |
521 "SEQ (der c r\<^sub>1) r\<^sub>2"}. The corresponding value must therefore |
544 "SEQ (der c r\<^sub>1) r\<^sub>2"}. The corresponding value must therefore |
522 be the form @{term "Seq v\<^sub>1 v\<^sub>2"}, which matches the left-hand |
545 be of the form @{term "Seq v\<^sub>1 v\<^sub>2"}, which matches the left-hand |
523 side in clause~(4) of @{term inj}. In the @{text "if"}-branch the derivative is an |
546 side in clause~(4) of @{term inj}. In the @{text "if"}-branch the derivative is an |
524 alternative, namely @{term "ALT (SEQ (der c r\<^sub>1) r\<^sub>2) (der c |
547 alternative, namely @{term "ALT (SEQ (der c r\<^sub>1) r\<^sub>2) (der c |
525 r\<^sub>2)"}. This means we either have to consider a @{text Left}- or |
548 r\<^sub>2)"}. This means we either have to consider a @{text Left}- or |
526 @{text Right}-value. In case of the @{text Left}-value we know further it |
549 @{text Right}-value. In case of the @{text Left}-value we know further it |
527 must be a value for a sequence regular expression. Therefore the pattern |
550 must be a value for a sequence regular expression. Therefore the pattern |
557 \end{tabular} |
580 \end{tabular} |
558 \end{lemma} |
581 \end{lemma} |
559 |
582 |
560 \begin{proof} |
583 \begin{proof} |
561 Both properties are by routine inductions: the first one can, for example, |
584 Both properties are by routine inductions: the first one can, for example, |
562 be proved by an induction over the definition of @{term derivatives}; the second by |
585 be proved by induction over the definition of @{term derivatives}; the second by |
563 an induction on @{term r}. There are no interesting cases.\qed |
586 an induction on @{term r}. There are no interesting cases.\qed |
564 \end{proof} |
587 \end{proof} |
565 |
588 |
566 Having defined the @{const mkeps} and @{text inj} function we can extend |
589 Having defined the @{const mkeps} and @{text inj} function we can extend |
567 \Brz's matcher so that a [lexical] value is constructed (assuming the |
590 \Brz's matcher so that a [lexical] value is constructed (assuming the |
682 \noindent |
705 \noindent |
683 The central lemma for our POSIX relation is that the @{text inj}-function |
706 The central lemma for our POSIX relation is that the @{text inj}-function |
684 preserves POSIX values. |
707 preserves POSIX values. |
685 |
708 |
686 \begin{lemma}\label{Posix2} |
709 \begin{lemma}\label{Posix2} |
687 @{thm[mode=IfThen] Posix2_roy_version} |
710 @{thm[mode=IfThen] Posix_injval} |
688 \end{lemma} |
711 \end{lemma} |
689 |
712 |
690 \begin{proof} |
713 \begin{proof} |
691 |
714 |
692 By induction on @{text r}. Suppose @{term "r = ALT r\<^sub>1 r\<^sub>2"}. There are |
715 By induction on @{text r}. Suppose @{term "r = ALT r\<^sub>1 r\<^sub>2"}. There are |
720 \[@{term "\<not> (\<exists>s\<^sub>3 s\<^sub>4. s\<^sub>3 \<noteq> [] \<and> s\<^sub>3 @ s\<^sub>4 = s\<^sub>2 \<and> (c # s\<^sub>1) @ s\<^sub>3 \<in> L r\<^sub>1 \<and> s\<^sub>4 \<in> L r\<^sub>2)"}\] |
743 \[@{term "\<not> (\<exists>s\<^sub>3 s\<^sub>4. s\<^sub>3 \<noteq> [] \<and> s\<^sub>3 @ s\<^sub>4 = s\<^sub>2 \<and> (c # s\<^sub>1) @ s\<^sub>3 \<in> L r\<^sub>1 \<and> s\<^sub>4 \<in> L r\<^sub>2)"}\] |
721 |
744 |
722 \noindent We can use the induction hypothesis for @{text "r\<^sub>1"} to obtain |
745 \noindent We can use the induction hypothesis for @{text "r\<^sub>1"} to obtain |
723 @{term "(c # s\<^sub>1) \<in> r\<^sub>1 \<rightarrow> injval r\<^sub>1 c v\<^sub>1"}. Putting this all together allows us to infer |
746 @{term "(c # s\<^sub>1) \<in> r\<^sub>1 \<rightarrow> injval r\<^sub>1 c v\<^sub>1"}. Putting this all together allows us to infer |
724 @{term "((c # s\<^sub>1) @ s\<^sub>2) \<in> SEQ r\<^sub>1 r\<^sub>2 \<rightarrow> Seq (injval r\<^sub>1 c v\<^sub>1) v\<^sub>2"}. The case @{text "(c)"} |
747 @{term "((c # s\<^sub>1) @ s\<^sub>2) \<in> SEQ r\<^sub>1 r\<^sub>2 \<rightarrow> Seq (injval r\<^sub>1 c v\<^sub>1) v\<^sub>2"}. The case @{text "(c)"} |
725 is similarly. |
748 is similar. |
726 |
749 |
727 For @{text "(b)"} we know @{term "s \<in> der c r\<^sub>2 \<rightarrow> v\<^sub>1"} and |
750 For @{text "(b)"} we know @{term "s \<in> der c r\<^sub>2 \<rightarrow> v\<^sub>1"} and |
728 @{term "s\<^sub>1 @ s\<^sub>2 \<notin> L (SEQ (der c r\<^sub>1) r\<^sub>2)"}. From the former |
751 @{term "s\<^sub>1 @ s\<^sub>2 \<notin> L (SEQ (der c r\<^sub>1) r\<^sub>2)"}. From the former |
729 we have @{term "(c # s) \<in> r\<^sub>2 \<rightarrow> (injval r\<^sub>2 c v\<^sub>1)"} by induction hypothesis |
752 we have @{term "(c # s) \<in> r\<^sub>2 \<rightarrow> (injval r\<^sub>2 c v\<^sub>1)"} by induction hypothesis |
730 for @{term "r\<^sub>2"}. From the latter we can infer |
753 for @{term "r\<^sub>2"}. From the latter we can infer |
758 \begin{proof} |
781 \begin{proof} |
759 By induction on @{term s} using Lem.~\ref{lemmkeps} and \ref{Posix2}.\qed |
782 By induction on @{term s} using Lem.~\ref{lemmkeps} and \ref{Posix2}.\qed |
760 \end{proof} |
783 \end{proof} |
761 |
784 |
762 \noindent This concludes our correctness proof. Note that we have not |
785 \noindent This concludes our correctness proof. Note that we have not |
763 changed the algorithm of Sulzmann and Lu,\footnote{Any deviation we |
786 changed the algorithm of Sulzmann and Lu,\footnote{All deviations we |
764 introduced is harmless.} but introduced our own specification for what a |
787 introduced are harmless.} but introduced our own specification for what a |
765 correct result---a POSIX value---should be. A strong point in favour of |
788 correct result---a POSIX value---should be. A strong point in favour of |
766 Sulzmann and Lu's algorithm is that it can be extended in various ways. |
789 Sulzmann and Lu's algorithm is that it can be extended in various ways. |
767 |
790 |
768 *} |
791 *} |
769 |
792 |
889 & & $|$ @{term "Some v"} @{text "\<Rightarrow>"} @{text "Some (inj r c (f\<^sub>r v))"} |
912 & & $|$ @{term "Some v"} @{text "\<Rightarrow>"} @{text "Some (inj r c (f\<^sub>r v))"} |
890 \end{tabular} |
913 \end{tabular} |
891 \end{center} |
914 \end{center} |
892 |
915 |
893 \noindent |
916 \noindent |
894 In the second clause we first calculate the derivative @{text "r \\ c"} |
917 In the second clause we first calculate the derivative @{term "der c r"} |
895 and then simplify the result. This gives us a simplified derivative |
918 and then simplify the result. This gives us a simplified derivative |
896 @{text "r\<^sub>s"} and a rectification function @{text "f\<^sub>r"}. The lexer |
919 @{text "r\<^sub>s"} and a rectification function @{text "f\<^sub>r"}. The lexer |
897 is then recursively called with the simplified derivative, but before |
920 is then recursively called with the simplified derivative, but before |
898 we inject the character @{term c} into value, we need to rectify |
921 we inject the character @{term c} into the value, we need to rectify |
899 it (that is construct @{term "f\<^sub>r v"}). We can prove that |
922 it (that is construct @{term "f\<^sub>r v"}). We can prove that |
900 |
923 |
901 \begin{lemma} |
924 \begin{theorem} |
902 @{thm slexer_correctness} |
925 @{thm slexer_correctness} |
903 \end{lemma} |
926 \end{theorem} |
904 |
927 |
905 \noindent |
928 \noindent |
906 holds but for lack of space refer the reader to our mechanisation for details. |
929 holds but for lack of space refer the reader to our mechanisation for details. |
907 |
930 |
908 *} |
931 *} |
909 |
932 |
910 section {* The Correctness Argument by Sulzmmann and Lu\label{argu} *} |
933 section {* The Correctness Argument by Sulzmann and Lu\label{argu} *} |
911 |
934 |
912 text {* |
935 text {* |
913 % \newcommand{\greedy}{\succcurlyeq_{gr}} |
936 % \newcommand{\greedy}{\succcurlyeq_{gr}} |
914 \newcommand{\posix}{>} |
937 \newcommand{\posix}{>} |
915 |
938 |