thys/Paper/Paper.thy
changeset 147 71f4ecc08849
parent 146 da81ffac4b10
child 148 702ed601349b
--- a/thys/Paper/Paper.thy	Fri Mar 11 13:53:53 2016 +0000
+++ b/thys/Paper/Paper.thy	Fri Mar 11 21:21:09 2016 +0000
@@ -707,7 +707,7 @@
   \[@{term "\<not> (\<exists>s\<^sub>3 s\<^sub>4. s\<^sub>3 \<noteq> [] \<and> s\<^sub>3 @ s\<^sub>4 = s\<^sub>2 \<and> (c # s\<^sub>1) @ s\<^sub>3 \<in> L r\<^sub>1 \<and> s\<^sub>4 \<in> L r\<^sub>2)"}\]
 
   \noindent We can use the induction hypothesis for @{text "r\<^sub>1"} to obtain
-  @{term "(c # s\<^sub>1) \<in> r\<^sub>1 \<rightarrow> injval r\<^sub>1 c v\<^sub>1"}. This all put together allows us to infer
+  @{term "(c # s\<^sub>1) \<in> r\<^sub>1 \<rightarrow> injval r\<^sub>1 c v\<^sub>1"}. Putting this all together allows us to infer
   @{term "((c # s\<^sub>1) @ s\<^sub>2) \<in> SEQ r\<^sub>1 r\<^sub>2 \<rightarrow> Seq (injval r\<^sub>1 c v\<^sub>1) v\<^sub>2"}. The case @{text "(c)"}
   is similarly.
 
@@ -723,7 +723,7 @@
   s) \<in> SEQ r\<^sub>1 r\<^sub>2 \<rightarrow> Seq (mkeps r\<^sub>1) (injval r\<^sub>2 c v\<^sub>1)"}, as required.
 
   Finally suppose @{term "r = STAR r\<^sub>1"}. This case is very similar to the
-  sequence case, except that we need to ensure that @{term "flat (injval r\<^sub>1
+  sequence case, except that we need to also ensure that @{term "flat (injval r\<^sub>1
   c v\<^sub>1) \<noteq> []"}. This follows from @{term "(c # s\<^sub>1)
   \<in> r\<^sub>1 \<rightarrow> injval r\<^sub>1 c v\<^sub>1"}  (which in turn follows from @{term "s\<^sub>1 \<in> der c
   r\<^sub>1 \<rightarrow> v\<^sub>1"} and the induction hypothesis).\qed
@@ -747,11 +747,10 @@
   \end{proof}
 
   \noindent This concludes our correctness proof. Note that we have not
-  changed the algorithm by Sulzmann and Lu,\footnote{Any deviation we introduced is harmless.} 
-  but introduced our own
-  specification for what a correct result---a POSIX value---should be.
-  A strong point in favour of Sulzmann and Lu's algorithm is that it
-  can be extended in various ways.
+  changed the algorithm of Sulzmann and Lu,\footnote{Any deviation we
+  introduced is harmless.} but introduced our own specification for what a
+  correct result---a POSIX value---should be. A strong point in favour of
+  Sulzmann and Lu's algorithm is that it can be extended in various ways.
 
 *}
 
@@ -759,11 +758,12 @@
 
 text {*
 
-  If we are interested in tokenising strings, then we need to not just
+  If we are interested in tokenising a string, then we need to not just
   split up the string into tokens, but also ``classify'' the tokens (for
-  example whether it is a keyword or an identifier). This can be
-  done with only minor modifications to the algorithm by introducing \emph{record regular
-  expressions} and \emph{record values} (for example \cite{Sulzmann2014b}):
+  example whether it is a keyword or an identifier). This can be done with
+  only minor modifications to the algorithm by introducing \emph{record
+  regular expressions} and \emph{record values} (for example
+  \cite{Sulzmann2014b}):
 
   \begin{center}  
   @{text "r :="}
@@ -776,17 +776,17 @@
   
   \noindent where @{text l} is a label, say a string, @{text r} a regular
   expression and @{text v} a value. All functions can be smoothly extended
-  to these regular expressions and values. For example \mbox{@{text "(l : r)"}} is
-  nullable iff @{term r} is, and so on. The purpose of the record regular
-  expression is to mark certain parts of a regular expression and then
-  record in the calculated value which parts of the string were matched by
-  this part. The label can then serve as classification of the tokens. For this recall the
-  regular expression @{text "(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\<star>"} for keywords and
-  identifiers from the Introduction. With record regular expression we can
-  form \mbox{@{text "((key : r\<^bsub>key\<^esub>) + (id : r\<^bsub>id\<^esub>))\<^sup>\<star>"}} and then traverse the
-  calculated value and only collect the underlying strings in record values.
-  With this we obtain finite sequences of pairs of labels and strings, for 
-  example
+  to these regular expressions and values. For example \mbox{@{text "(l :
+  r)"}} is nullable iff @{term r} is, and so on. The purpose of the record
+  regular expression is to mark certain parts of a regular expression and
+  then record in the calculated value which parts of the string were matched
+  by this part. The label can then serve as classification for the tokens.
+  For this recall the regular expression @{text "(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\<star>"} for
+  keywords and identifiers from the Introduction. With record regular
+  expression we can form \mbox{@{text "((key : r\<^bsub>key\<^esub>) + (id : r\<^bsub>id\<^esub>))\<^sup>\<star>"}}
+  and then traverse the calculated value and only collect the underlying
+  strings in record values. With this we obtain finite sequences of pairs of
+  labels and strings, for example
 
   \[@{text "(l\<^sub>1 : s\<^sub>1), ..., (l\<^sub>n : s\<^sub>n)"}\]
   
@@ -794,15 +794,15 @@
   identifier-token and so on) can be extracted.
 
   Derivatives as calculated by \Brz's method are usually more complex
-  regular expressions than the initial one; the result is that the matching
-  and lexing algorithms are often abysmally slow. However, various
-  optimisations are possible, such as the simplifications of @{term "ALT
-  ZERO r"}, @{term "ALT r ZERO"}, @{term "SEQ ONE r"} and @{term "SEQ r
-  ONE"} to @{term r}. These simplifications can speed up the algorithms considerably,
-  as noted in \cite{Sulzmann2014}.
-  One of the advantages of having a simple specification
-  and correctness proof is that the latter can be refined to prove the 
-  correctness of such simplification steps.
+  regular expressions than the initial one; the result is that the
+  deivative-based matching and lexing algorithms are often abysmally slow.
+  However, various optimisations are possible, such as the simplifications
+  of @{term "ALT ZERO r"}, @{term "ALT r ZERO"}, @{term "SEQ ONE r"} and
+  @{term "SEQ r ONE"} to @{term r}. These simplifications can speed up the
+  algorithms considerably, as noted in \cite{Sulzmann2014}. One of the
+  advantages of having a simple specification and correctness proof is that
+  the latter can be refined to prove the correctness of such simplification
+  steps.
 
   While the simplification of regular expressions according to 
   rules like
@@ -860,11 +860,11 @@
   \end{tabular}
   \end{center} 
 
-  \noindent where @{term "id"} stands for the identity function. 
-  This function returns a simplified regular expression and a rectification 
-  function. Note that
-  we do not simplify under stars: this seems to slow down the algorithm,
-  rather than speed up. The optimised lexer is then given by the clauses:
+  \noindent where @{term "id"} stands for the identity function. This
+  function returns a simplified regular expression and a corresponding
+  rectification function. Note that we do not simplify under stars: this
+  seems to slow down the algorithm, rather than speed up. The optimised
+  lexer is then given by the clauses:
   
   \begin{center}
   \begin{tabular}{lcl}