equal
deleted
inserted
replaced
822 r)"}} is nullable iff @{term r} is, and so on. The purpose of the record |
822 r)"}} is nullable iff @{term r} is, and so on. The purpose of the record |
823 regular expression is to mark certain parts of a regular expression and |
823 regular expression is to mark certain parts of a regular expression and |
824 then record in the calculated value which parts of the string were matched |
824 then record in the calculated value which parts of the string were matched |
825 by this part. The label can then serve as classification for the tokens. |
825 by this part. The label can then serve as classification for the tokens. |
826 For this recall the regular expression @{text "(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\<star>"} for |
826 For this recall the regular expression @{text "(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\<star>"} for |
827 keywords and identifiers from the Introduction. With record regular |
827 keywords and identifiers from the Introduction. With the record regular |
828 expression we can form \mbox{@{text "((key : r\<^bsub>key\<^esub>) + (id : r\<^bsub>id\<^esub>))\<^sup>\<star>"}} |
828 expression we can form \mbox{@{text "((key : r\<^bsub>key\<^esub>) + (id : r\<^bsub>id\<^esub>))\<^sup>\<star>"}} |
829 and then traverse the calculated value and only collect the underlying |
829 and then traverse the calculated value and only collect the underlying |
830 strings in record values. With this we obtain finite sequences of pairs of |
830 strings in record values. With this we obtain finite sequences of pairs of |
831 labels and strings, for example |
831 labels and strings, for example |
832 |
832 |
847 steps. |
847 steps. |
848 |
848 |
849 While the simplification of regular expressions according to |
849 While the simplification of regular expressions according to |
850 rules like |
850 rules like |
851 |
851 |
852 \begin{center} |
852 \begin{equation}\label{Simpl} |
853 \begin{tabular}{lcl} |
853 \begin{array}{lcl} |
854 @{term "ALT ZERO r"} & @{text "\<Rightarrow>"} & @{term r}\\ |
854 @{term "ALT ZERO r"} & @{text "\<Rightarrow>"} & @{term r}\\ |
855 @{term "ALT r ZERO"} & @{text "\<Rightarrow>"} & @{term r}\\ |
855 @{term "ALT r ZERO"} & @{text "\<Rightarrow>"} & @{term r}\\ |
856 @{term "SEQ ONE r"} & @{text "\<Rightarrow>"} & @{term r}\\ |
856 @{term "SEQ ONE r"} & @{text "\<Rightarrow>"} & @{term r}\\ |
857 @{term "SEQ r ONE"} & @{text "\<Rightarrow>"} & @{term r} |
857 @{term "SEQ r ONE"} & @{text "\<Rightarrow>"} & @{term r} |
858 \end{tabular} |
858 \end{array} |
859 \end{center} |
859 \end{equation} |
860 |
860 |
861 \noindent is well understood, there is an obstacle with the POSIX value |
861 \noindent is well understood, there is an obstacle with the POSIX value |
862 calculation algorithm by Sulzmann and Lu: if we build a derivative regular |
862 calculation algorithm by Sulzmann and Lu: if we build a derivative regular |
863 expression and then simplify it, we will calculate a POSIX value for this |
863 expression and then simplify it, we will calculate a POSIX value for this |
864 simplified derivative regular expression, \emph{not} for the original (unsimplified) |
864 simplified derivative regular expression, \emph{not} for the original (unsimplified) |
890 \end{tabular} |
890 \end{tabular} |
891 \end{center} |
891 \end{center} |
892 |
892 |
893 \noindent |
893 \noindent |
894 The functions @{text "simp\<^bsub>Alt\<^esub>"} and @{text "simp\<^bsub>Seq\<^esub>"} encode the simplification rules |
894 The functions @{text "simp\<^bsub>Alt\<^esub>"} and @{text "simp\<^bsub>Seq\<^esub>"} encode the simplification rules |
895 and compose the rectification functions. The main simplification function is then |
895 in \eqref{Simpl} and compose the rectification functions (simplifications can occur |
|
896 deep inside the regular expression). The main simplification function is then |
896 |
897 |
897 \begin{center} |
898 \begin{center} |
898 \begin{tabular}{lcl} |
899 \begin{tabular}{lcl} |
899 @{term "simp (ALT r\<^sub>1 r\<^sub>2)"} & $\dn$ & @{term "simp_ALT (simp r\<^sub>1) (simp r\<^sub>2)"}\\ |
900 @{term "simp (ALT r\<^sub>1 r\<^sub>2)"} & $\dn$ & @{term "simp_ALT (simp r\<^sub>1) (simp r\<^sub>2)"}\\ |
900 @{term "simp (SEQ r\<^sub>1 r\<^sub>2)"} & $\dn$ & @{term "simp_SEQ (simp r\<^sub>1) (simp r\<^sub>2)"}\\ |
901 @{term "simp (SEQ r\<^sub>1 r\<^sub>2)"} & $\dn$ & @{term "simp_SEQ (simp r\<^sub>1) (simp r\<^sub>2)"}\\ |
922 \noindent |
923 \noindent |
923 In the second clause we first calculate the derivative @{term "der c r"} |
924 In the second clause we first calculate the derivative @{term "der c r"} |
924 and then simplify the result. This gives us a simplified derivative |
925 and then simplify the result. This gives us a simplified derivative |
925 @{text "r\<^sub>s"} and a rectification function @{text "f\<^sub>r"}. The lexer |
926 @{text "r\<^sub>s"} and a rectification function @{text "f\<^sub>r"}. The lexer |
926 is then recursively called with the simplified derivative, but before |
927 is then recursively called with the simplified derivative, but before |
927 we inject the character @{term c} into the value, we need to rectify |
928 we inject the character @{term c} into the value @{term v}, we need to rectify |
928 it (that is construct @{term "f\<^sub>r v"}). We can prove that |
929 @{term v} (that is construct @{term "f\<^sub>r v"}). We can prove that |
929 |
930 |
930 \begin{theorem} |
931 \begin{theorem} |
931 @{thm slexer_correctness} |
932 @{thm slexer_correctness} |
932 \end{theorem} |
933 \end{theorem} |
933 |
934 |
1171 \url{http://www.inf.kcl.ac.uk/staff/urbanc/lex}.\medskip |
1172 \url{http://www.inf.kcl.ac.uk/staff/urbanc/lex}.\medskip |
1172 |
1173 |
1173 \noindent |
1174 \noindent |
1174 {\bf Acknowledgements:} |
1175 {\bf Acknowledgements:} |
1175 We are very grateful to Martin Sulzmann for his comments on our work and |
1176 We are very grateful to Martin Sulzmann for his comments on our work and |
1176 also patiently explaining to us the details in \cite{Sulzmann2014}. |
1177 moreover patiently explaining to us the details in \cite{Sulzmann2014}. We |
|
1178 also received very helpful comments from James Cheney and anonymous referees. |
1177 \small |
1179 \small |
1178 \bibliographystyle{plain} |
1180 \bibliographystyle{plain} |
1179 \bibliography{root} |
1181 \bibliography{root} |
1180 |
1182 |
1181 *} |
1183 *} |