lexing: comparison thys/Paper/Paper.thy

equal deleted inserted replaced

-:4178b7e71809
+:a87b8a09ffe8
 F_SEQ1 ("F\<^bsub>Seq1\<^esub> _ _") and
 F_SEQ2 ("F\<^bsub>Seq2\<^esub> _ _") and
 F_SEQ ("F\<^bsub>Seq\<^esub> _ _") and
 simp_SEQ ("simp\<^bsub>Seq\<^esub> _ _" [1000, 1000] 1) and
 simp_ALT ("simp\<^bsub>Alt\<^esub> _ _" [1000, 1000] 1) and
-matcher3 ("lexer\<^sup>+ _ _")
+matcher3 ("lexer\<^sup>+ _ _" [78,78] 77)
 definition
 "match r s \<equiv> nullable (ders s r)"
 (*>*)
 @{term "simp_SEQ (r\<^sub>1, f\<^sub>1) (r\<^sub>2, f\<^sub>2)"} & $\dn$ & @{term "(SEQ r\<^sub>1 r\<^sub>2, F_SEQ f\<^sub>1 f\<^sub>2)"}\\
 \end{tabular}
 \end{center}
 \noindent
-The main simplification function is then
+The functions @{text "simp\<^bsub>ALT\<^esub>"} and @{text "simp\<^bsub>SEQ\<^esub>"} encode the simplification rules
+and compose the rectification functions. The main simplification function is then
 \begin{center}
 \begin{tabular}{lcl}
 @{term "simp (ALT r\<^sub>1 r\<^sub>2)"} & $\dn$ & @{term "simp_ALT (simp r\<^sub>1) (simp r\<^sub>2)"}\\
 @{term "simp (SEQ r\<^sub>1 r\<^sub>2)"} & $\dn$ & @{term "simp_SEQ (simp r\<^sub>1) (simp r\<^sub>2)"}\\
 @{term "simp r"} & $\dn$ & @{term "(r, id)"}\\
 \end{tabular}
 \end{center}
-\noindent where @{term "id"} stands for the identity function. Note that
+\noindent where @{term "id"} stands for the identity function.
+This function returns a simplified regular expression and a rectification
+function. Note that
 we do not simplify under stars: this seems to slow down the algorithm,
 rather than speed up. The optimised lexer is then given by the clauses:
 \begin{center}
 \begin{tabular}{lcl}
 \end{center}
 \noindent
 In the second clause we first calculate the derivative @{text "r \\ c"}
 and then simplify the result. This gives us a simplified derivative
-@{text "r\<^sub>s"} and a rectification function @{text "f\<^sub>r"}. The matcher
+@{text "r\<^sub>s"} and a rectification function @{text "f\<^sub>r"}. The lexer
-is recursively called with the simplified derivative, but before
+is then recursively called with the simplified derivative, but before
 we inject the character @{term c} into value, we need to rectify
-it (@{term "f\<^sub>r v"}). We can prove that
+it (that is construct @{term "f\<^sub>r v"}). We can prove that
 \begin{lemma}
 @{term "matcher3 r s = matcher r s"}
 \end{lemma}
 \noindent
-holds but refer the reader to our mechanisation for details.
+holds but for lack of space refer the reader to our mechanisation for details.
 *}
 section {* The Correctness Argument by Sulzmmann and Lu *}
 $\infer{v_{1} \posix_{r_{1}} v'_{1}}
 {Seq\,v_{1}\,v_{2} \posix_{r_{1}r_{2}} Seq\,v'_{1}\,v'_{2}}(C2)$
 &
 $\infer{v_{2} \posix_{r_{2}} v'_{2}}
 {Seq\,v_{1}\,v_{2} \posix_{r_{1}r_{2}} Seq\,v_{1}\,v'_{2}}(C1)$
-\medskip\\
+\smallskip\\
 $\infer{ len |v_{2}| > len |v_{1}|}
 {Right \; v_{2} \posix_{r_{1}+r_{2}} Left \; v_{1}}(A1)$
 &
 $\infer{ len |v_{1}| \geq len |v_{2}|}
 {Left \; v_{1} \posix_{r_{1}+r_{2}} Right \; v_{2}} (A2)$
-\medskip\\
+\smallskip\\
 $\infer{ v_{2} \posix_{r_{2}} v'_{2}}
 {Right \; v_{2} \posix_{r_{1}+r_{2}} Right \; v'_{2}}(A3)$ &
 $\infer{ v_{1} \posix_{r_{1}} v'_{1}}
 {Left \; v_{1} \posix_{r_{1}+r_{2}} Left \; v'_{1}}(A4)$
-\medskip\\
+\smallskip\\
 $\infer{|v :: vs| = []} {Stars\,[] \posix_{r^{\star}} Stars\,(v :: vs)}(K1)$ &
 $\infer{|v :: vs| \neq []} {Stars\,(v :: vs) \posix_{r^{\star}} Stars\,[]}(K2)$
-\medskip\\
+\smallskip\\
 $\infer{ v_{1} \posix_{r} v_{2}}
 {Stars\,(v_{1} :: vs_{1}) \posix_{r^{\star}} Stars\,(v_{2} :: vs_{2})}(K3)$ &
 $\infer{ Stars\,vs_{1} \posix_{r^{\star}} Stars\,vs_{2}}
 {Stars\,(v :: vs_{1}) \posix_{r^{\star}} Stars\,(v :: vs_{2})}(K4)$
 text {*
 We have implemented the POSIX value calculation algorithm introduced in
 \cite{Sulzmann2014}. Our implementation is nearly identical to the
-original and all modifications are harmless (like our char-clause for
+original and all modifications we introduced are harmless (like our char-clause for
 @{text inj}). We have proved this algorithm to be correct, but correct
 according to our own specification of what POSIX values are. Our
 specification (inspired from work in \cite{Vansummeren2006}) appears to be
 much simpler than in \cite{Sulzmann2014} and our proofs are nearly always
 straightforward. We have attempted to formalise the original proof
 by Sulzmann and Lu \cite{Sulzmann2014}, but we believe it contains
 unfillable gaps. In the online version of \cite{Sulzmann2014}, the authors
 already acknowledge some small problems, but our experience suggests
-there are more serious problems.
+that there are more serious problems.
 Closely related to our work is an automata-based lexer formalised by
 Nipkow \cite{Nipkow98}. This lexer also splits up strings into longest
 initial substrings, but Nipkow's algorithm is not completely
 computational. The algorithm by Sulzmann and Lu, in contrast, can be
-implemented easily in functional languages. A bespoke lexer for the
+implemented with easy in any functional language. A bespoke lexer for the
-Imp-Language is formalised in Coq as part of the Software Foundations book
+Imp-language is formalised in Coq as part of the Software Foundations book
 \cite{Pierce2015}. The disadvantage of such bespoke lexers is that they
 do not generalise easily to more advanced features.
 Our formalisation is available from
 \url{http://www.inf.kcl.ac.uk/staff/urbanc/lex}.

changeset 138	a87b8a09ffe8
parent 137	4178b7e71809
child 139	f28cf86a1e7f