# HG changeset patch # User Christian Urban # Date 1457436891 0 # Node ID a87b8a09ffe86b9284afef3e48050bb9a007f0b5 # Parent 4178b7e718090e45779f2f3876725ade10a74aaf updated diff -r 4178b7e71809 -r a87b8a09ffe8 thys/Paper/Paper.thy --- a/thys/Paper/Paper.thy Tue Mar 08 11:24:56 2016 +0000 +++ b/thys/Paper/Paper.thy Tue Mar 08 11:34:51 2016 +0000 @@ -52,7 +52,7 @@ F_SEQ ("F\<^bsub>Seq\<^esub> _ _") and simp_SEQ ("simp\<^bsub>Seq\<^esub> _ _" [1000, 1000] 1) and simp_ALT ("simp\<^bsub>Alt\<^esub> _ _" [1000, 1000] 1) and - matcher3 ("lexer\<^sup>+ _ _") + matcher3 ("lexer\<^sup>+ _ _" [78,78] 77) definition "match r s \ nullable (ders s r)" @@ -851,7 +851,8 @@ \end{center} \noindent - The main simplification function is then + The functions @{text "simp\<^bsub>ALT\<^esub>"} and @{text "simp\<^bsub>SEQ\<^esub>"} encode the simplification rules + and compose the rectification functions. The main simplification function is then \begin{center} \begin{tabular}{lcl} @@ -861,7 +862,9 @@ \end{tabular} \end{center} - \noindent where @{term "id"} stands for the identity function. Note that + \noindent where @{term "id"} stands for the identity function. + This function returns a simplified regular expression and a rectification + function. Note that we do not simplify under stars: this seems to slow down the algorithm, rather than speed up. The optimised lexer is then given by the clauses: @@ -879,17 +882,17 @@ \noindent In the second clause we first calculate the derivative @{text "r \\ c"} and then simplify the result. This gives us a simplified derivative - @{text "r\<^sub>s"} and a rectification function @{text "f\<^sub>r"}. The matcher - is recursively called with the simplified derivative, but before + @{text "r\<^sub>s"} and a rectification function @{text "f\<^sub>r"}. The lexer + is then recursively called with the simplified derivative, but before we inject the character @{term c} into value, we need to rectify - it (@{term "f\<^sub>r v"}). We can prove that + it (that is construct @{term "f\<^sub>r v"}). We can prove that \begin{lemma} @{term "matcher3 r s = matcher r s"} \end{lemma} \noindent - holds but refer the reader to our mechanisation for details. + holds but for lack of space refer the reader to our mechanisation for details. *} @@ -914,7 +917,7 @@ & $\infer{v_{2} \posix_{r_{2}} v'_{2}} {Seq\,v_{1}\,v_{2} \posix_{r_{1}r_{2}} Seq\,v_{1}\,v'_{2}}(C1)$ -\medskip\\ +\smallskip\\ $\infer{ len |v_{2}| > len |v_{1}|} @@ -922,18 +925,18 @@ & $\infer{ len |v_{1}| \geq len |v_{2}|} {Left \; v_{1} \posix_{r_{1}+r_{2}} Right \; v_{2}} (A2)$ -\medskip\\ +\smallskip\\ $\infer{ v_{2} \posix_{r_{2}} v'_{2}} {Right \; v_{2} \posix_{r_{1}+r_{2}} Right \; v'_{2}}(A3)$ & $\infer{ v_{1} \posix_{r_{1}} v'_{1}} {Left \; v_{1} \posix_{r_{1}+r_{2}} Left \; v'_{1}}(A4)$ -\medskip\\ +\smallskip\\ $\infer{|v :: vs| = []} {Stars\,[] \posix_{r^{\star}} Stars\,(v :: vs)}(K1)$ & $\infer{|v :: vs| \neq []} {Stars\,(v :: vs) \posix_{r^{\star}} Stars\,[]}(K2)$ -\medskip\\ +\smallskip\\ $\infer{ v_{1} \posix_{r} v_{2}} {Stars\,(v_{1} :: vs_{1}) \posix_{r^{\star}} Stars\,(v_{2} :: vs_{2})}(K3)$ & @@ -1116,7 +1119,7 @@ We have implemented the POSIX value calculation algorithm introduced in \cite{Sulzmann2014}. Our implementation is nearly identical to the - original and all modifications are harmless (like our char-clause for + original and all modifications we introduced are harmless (like our char-clause for @{text inj}). We have proved this algorithm to be correct, but correct according to our own specification of what POSIX values are. Our specification (inspired from work in \cite{Vansummeren2006}) appears to be @@ -1125,14 +1128,14 @@ by Sulzmann and Lu \cite{Sulzmann2014}, but we believe it contains unfillable gaps. In the online version of \cite{Sulzmann2014}, the authors already acknowledge some small problems, but our experience suggests - there are more serious problems. + that there are more serious problems. Closely related to our work is an automata-based lexer formalised by Nipkow \cite{Nipkow98}. This lexer also splits up strings into longest initial substrings, but Nipkow's algorithm is not completely computational. The algorithm by Sulzmann and Lu, in contrast, can be - implemented easily in functional languages. A bespoke lexer for the - Imp-Language is formalised in Coq as part of the Software Foundations book + implemented with easy in any functional language. A bespoke lexer for the + Imp-language is formalised in Coq as part of the Software Foundations book \cite{Pierce2015}. The disadvantage of such bespoke lexers is that they do not generalise easily to more advanced features. Our formalisation is available from diff -r 4178b7e71809 -r a87b8a09ffe8 thys/paper.pdf Binary file thys/paper.pdf has changed