thys/Paper/Paper.thy
changeset 138 a87b8a09ffe8
parent 137 4178b7e71809
child 139 f28cf86a1e7f
--- a/thys/Paper/Paper.thy	Tue Mar 08 11:24:56 2016 +0000
+++ b/thys/Paper/Paper.thy	Tue Mar 08 11:34:51 2016 +0000
@@ -52,7 +52,7 @@
   F_SEQ ("F\<^bsub>Seq\<^esub> _ _") and
   simp_SEQ ("simp\<^bsub>Seq\<^esub> _ _" [1000, 1000] 1) and
   simp_ALT ("simp\<^bsub>Alt\<^esub> _ _" [1000, 1000] 1) and
-  matcher3 ("lexer\<^sup>+ _ _")
+  matcher3 ("lexer\<^sup>+ _ _" [78,78] 77)
 
 definition 
   "match r s \<equiv> nullable (ders s r)"
@@ -851,7 +851,8 @@
   \end{center}
 
   \noindent
-  The main simplification function is then 
+  The functions @{text "simp\<^bsub>ALT\<^esub>"} and @{text "simp\<^bsub>SEQ\<^esub>"} encode the simplification rules
+  and compose the rectification functions. The main simplification function is then 
 
   \begin{center}
   \begin{tabular}{lcl}
@@ -861,7 +862,9 @@
   \end{tabular}
   \end{center} 
 
-  \noindent where @{term "id"} stands for the identity function. Note that
+  \noindent where @{term "id"} stands for the identity function. 
+  This function returns a simplified regular expression and a rectification 
+  function. Note that
   we do not simplify under stars: this seems to slow down the algorithm,
   rather than speed up. The optimised lexer is then given by the clauses:
   
@@ -879,17 +882,17 @@
   \noindent
   In the second clause we first calculate the derivative @{text "r \\ c"}
   and then simplify the result. This gives us a simplified derivative
-  @{text "r\<^sub>s"} and a rectification function @{text "f\<^sub>r"}. The matcher
-  is recursively called with the simplified derivative, but before
+  @{text "r\<^sub>s"} and a rectification function @{text "f\<^sub>r"}. The lexer
+  is then recursively called with the simplified derivative, but before
   we inject the character @{term c} into value, we need to rectify
-  it (@{term "f\<^sub>r v"}). We can prove that
+  it (that is construct @{term "f\<^sub>r v"}). We can prove that
 
   \begin{lemma}
   @{term "matcher3 r s = matcher r s"}
   \end{lemma}
 
   \noindent
-  holds but refer the reader to our mechanisation for details.
+  holds but for lack of space refer the reader to our mechanisation for details.
 
 *}
 
@@ -914,7 +917,7 @@
 &
 $\infer{v_{2} \posix_{r_{2}} v'_{2}} 
        {Seq\,v_{1}\,v_{2} \posix_{r_{1}r_{2}} Seq\,v_{1}\,v'_{2}}(C1)$
-\medskip\\
+\smallskip\\
 		
 		
 $\infer{ len |v_{2}| > len |v_{1}|} 
@@ -922,18 +925,18 @@
 &
 $\infer{ len |v_{1}| \geq len |v_{2}|} 
        {Left \; v_{1} \posix_{r_{1}+r_{2}} Right \; v_{2}} (A2)$ 
-\medskip\\	
+\smallskip\\	
 
 $\infer{ v_{2} \posix_{r_{2}} v'_{2}} 
        {Right \; v_{2} \posix_{r_{1}+r_{2}} Right \; v'_{2}}(A3)$ & 
 
 $\infer{ v_{1} \posix_{r_{1}} v'_{1}} 
        {Left \; v_{1} \posix_{r_{1}+r_{2}} Left \; v'_{1}}(A4)$ 
-\medskip\\
+\smallskip\\
 
  $\infer{|v :: vs| = []} {Stars\,[] \posix_{r^{\star}} Stars\,(v :: vs)}(K1)$ & 
  $\infer{|v :: vs| \neq []} {Stars\,(v :: vs) \posix_{r^{\star}} Stars\,[]}(K2)$
-\medskip\\
+\smallskip\\
 	
 $\infer{ v_{1} \posix_{r} v_{2}} 
        {Stars\,(v_{1} :: vs_{1}) \posix_{r^{\star}} Stars\,(v_{2} :: vs_{2})}(K3)$ & 
@@ -1116,7 +1119,7 @@
 
   We have implemented the POSIX value calculation algorithm introduced in
   \cite{Sulzmann2014}. Our implementation is nearly identical to the
-  original and all modifications are harmless (like our char-clause for
+  original and all modifications we introduced are harmless (like our char-clause for
   @{text inj}). We have proved this algorithm to be correct, but correct
   according to our own specification of what POSIX values are. Our
   specification (inspired from work in \cite{Vansummeren2006}) appears to be
@@ -1125,14 +1128,14 @@
   by Sulzmann and Lu \cite{Sulzmann2014}, but we believe it contains
   unfillable gaps. In the online version of \cite{Sulzmann2014}, the authors
   already acknowledge some small problems, but our experience suggests
-  there are more serious problems. 
+  that there are more serious problems. 
   
   Closely related to our work is an automata-based lexer formalised by
   Nipkow \cite{Nipkow98}. This lexer also splits up strings into longest
   initial substrings, but Nipkow's algorithm is not completely
   computational. The algorithm by Sulzmann and Lu, in contrast, can be
-  implemented easily in functional languages. A bespoke lexer for the
-  Imp-Language is formalised in Coq as part of the Software Foundations book
+  implemented with easy in any functional language. A bespoke lexer for the
+  Imp-language is formalised in Coq as part of the Software Foundations book
   \cite{Pierce2015}. The disadvantage of such bespoke lexers is that they
   do not generalise easily to more advanced features.
   Our formalisation is available from