--- a/thys/Paper/Paper.thy Tue Mar 08 11:24:56 2016 +0000
+++ b/thys/Paper/Paper.thy Tue Mar 08 11:34:51 2016 +0000
@@ -52,7 +52,7 @@
F_SEQ ("F\<^bsub>Seq\<^esub> _ _") and
simp_SEQ ("simp\<^bsub>Seq\<^esub> _ _" [1000, 1000] 1) and
simp_ALT ("simp\<^bsub>Alt\<^esub> _ _" [1000, 1000] 1) and
- matcher3 ("lexer\<^sup>+ _ _")
+ matcher3 ("lexer\<^sup>+ _ _" [78,78] 77)
definition
"match r s \<equiv> nullable (ders s r)"
@@ -851,7 +851,8 @@
\end{center}
\noindent
- The main simplification function is then
+ The functions @{text "simp\<^bsub>ALT\<^esub>"} and @{text "simp\<^bsub>SEQ\<^esub>"} encode the simplification rules
+ and compose the rectification functions. The main simplification function is then
\begin{center}
\begin{tabular}{lcl}
@@ -861,7 +862,9 @@
\end{tabular}
\end{center}
- \noindent where @{term "id"} stands for the identity function. Note that
+ \noindent where @{term "id"} stands for the identity function.
+ This function returns a simplified regular expression and a rectification
+ function. Note that
we do not simplify under stars: this seems to slow down the algorithm,
rather than speed up. The optimised lexer is then given by the clauses:
@@ -879,17 +882,17 @@
\noindent
In the second clause we first calculate the derivative @{text "r \\ c"}
and then simplify the result. This gives us a simplified derivative
- @{text "r\<^sub>s"} and a rectification function @{text "f\<^sub>r"}. The matcher
- is recursively called with the simplified derivative, but before
+ @{text "r\<^sub>s"} and a rectification function @{text "f\<^sub>r"}. The lexer
+ is then recursively called with the simplified derivative, but before
we inject the character @{term c} into value, we need to rectify
- it (@{term "f\<^sub>r v"}). We can prove that
+ it (that is construct @{term "f\<^sub>r v"}). We can prove that
\begin{lemma}
@{term "matcher3 r s = matcher r s"}
\end{lemma}
\noindent
- holds but refer the reader to our mechanisation for details.
+ holds but for lack of space refer the reader to our mechanisation for details.
*}
@@ -914,7 +917,7 @@
&
$\infer{v_{2} \posix_{r_{2}} v'_{2}}
{Seq\,v_{1}\,v_{2} \posix_{r_{1}r_{2}} Seq\,v_{1}\,v'_{2}}(C1)$
-\medskip\\
+\smallskip\\
$\infer{ len |v_{2}| > len |v_{1}|}
@@ -922,18 +925,18 @@
&
$\infer{ len |v_{1}| \geq len |v_{2}|}
{Left \; v_{1} \posix_{r_{1}+r_{2}} Right \; v_{2}} (A2)$
-\medskip\\
+\smallskip\\
$\infer{ v_{2} \posix_{r_{2}} v'_{2}}
{Right \; v_{2} \posix_{r_{1}+r_{2}} Right \; v'_{2}}(A3)$ &
$\infer{ v_{1} \posix_{r_{1}} v'_{1}}
{Left \; v_{1} \posix_{r_{1}+r_{2}} Left \; v'_{1}}(A4)$
-\medskip\\
+\smallskip\\
$\infer{|v :: vs| = []} {Stars\,[] \posix_{r^{\star}} Stars\,(v :: vs)}(K1)$ &
$\infer{|v :: vs| \neq []} {Stars\,(v :: vs) \posix_{r^{\star}} Stars\,[]}(K2)$
-\medskip\\
+\smallskip\\
$\infer{ v_{1} \posix_{r} v_{2}}
{Stars\,(v_{1} :: vs_{1}) \posix_{r^{\star}} Stars\,(v_{2} :: vs_{2})}(K3)$ &
@@ -1116,7 +1119,7 @@
We have implemented the POSIX value calculation algorithm introduced in
\cite{Sulzmann2014}. Our implementation is nearly identical to the
- original and all modifications are harmless (like our char-clause for
+ original and all modifications we introduced are harmless (like our char-clause for
@{text inj}). We have proved this algorithm to be correct, but correct
according to our own specification of what POSIX values are. Our
specification (inspired from work in \cite{Vansummeren2006}) appears to be
@@ -1125,14 +1128,14 @@
by Sulzmann and Lu \cite{Sulzmann2014}, but we believe it contains
unfillable gaps. In the online version of \cite{Sulzmann2014}, the authors
already acknowledge some small problems, but our experience suggests
- there are more serious problems.
+ that there are more serious problems.
Closely related to our work is an automata-based lexer formalised by
Nipkow \cite{Nipkow98}. This lexer also splits up strings into longest
initial substrings, but Nipkow's algorithm is not completely
computational. The algorithm by Sulzmann and Lu, in contrast, can be
- implemented easily in functional languages. A bespoke lexer for the
- Imp-Language is formalised in Coq as part of the Software Foundations book
+ implemented with easy in any functional language. A bespoke lexer for the
+ Imp-language is formalised in Coq as part of the Software Foundations book
\cite{Pierce2015}. The disadvantage of such bespoke lexers is that they
do not generalise easily to more advanced features.
Our formalisation is available from