--- a/thys/Paper/Paper.thy Fri Mar 11 13:53:53 2016 +0000
+++ b/thys/Paper/Paper.thy Fri Mar 11 21:21:09 2016 +0000
@@ -707,7 +707,7 @@
\[@{term "\<not> (\<exists>s\<^sub>3 s\<^sub>4. s\<^sub>3 \<noteq> [] \<and> s\<^sub>3 @ s\<^sub>4 = s\<^sub>2 \<and> (c # s\<^sub>1) @ s\<^sub>3 \<in> L r\<^sub>1 \<and> s\<^sub>4 \<in> L r\<^sub>2)"}\]
\noindent We can use the induction hypothesis for @{text "r\<^sub>1"} to obtain
- @{term "(c # s\<^sub>1) \<in> r\<^sub>1 \<rightarrow> injval r\<^sub>1 c v\<^sub>1"}. This all put together allows us to infer
+ @{term "(c # s\<^sub>1) \<in> r\<^sub>1 \<rightarrow> injval r\<^sub>1 c v\<^sub>1"}. Putting this all together allows us to infer
@{term "((c # s\<^sub>1) @ s\<^sub>2) \<in> SEQ r\<^sub>1 r\<^sub>2 \<rightarrow> Seq (injval r\<^sub>1 c v\<^sub>1) v\<^sub>2"}. The case @{text "(c)"}
is similarly.
@@ -723,7 +723,7 @@
s) \<in> SEQ r\<^sub>1 r\<^sub>2 \<rightarrow> Seq (mkeps r\<^sub>1) (injval r\<^sub>2 c v\<^sub>1)"}, as required.
Finally suppose @{term "r = STAR r\<^sub>1"}. This case is very similar to the
- sequence case, except that we need to ensure that @{term "flat (injval r\<^sub>1
+ sequence case, except that we need to also ensure that @{term "flat (injval r\<^sub>1
c v\<^sub>1) \<noteq> []"}. This follows from @{term "(c # s\<^sub>1)
\<in> r\<^sub>1 \<rightarrow> injval r\<^sub>1 c v\<^sub>1"} (which in turn follows from @{term "s\<^sub>1 \<in> der c
r\<^sub>1 \<rightarrow> v\<^sub>1"} and the induction hypothesis).\qed
@@ -747,11 +747,10 @@
\end{proof}
\noindent This concludes our correctness proof. Note that we have not
- changed the algorithm by Sulzmann and Lu,\footnote{Any deviation we introduced is harmless.}
- but introduced our own
- specification for what a correct result---a POSIX value---should be.
- A strong point in favour of Sulzmann and Lu's algorithm is that it
- can be extended in various ways.
+ changed the algorithm of Sulzmann and Lu,\footnote{Any deviation we
+ introduced is harmless.} but introduced our own specification for what a
+ correct result---a POSIX value---should be. A strong point in favour of
+ Sulzmann and Lu's algorithm is that it can be extended in various ways.
*}
@@ -759,11 +758,12 @@
text {*
- If we are interested in tokenising strings, then we need to not just
+ If we are interested in tokenising a string, then we need to not just
split up the string into tokens, but also ``classify'' the tokens (for
- example whether it is a keyword or an identifier). This can be
- done with only minor modifications to the algorithm by introducing \emph{record regular
- expressions} and \emph{record values} (for example \cite{Sulzmann2014b}):
+ example whether it is a keyword or an identifier). This can be done with
+ only minor modifications to the algorithm by introducing \emph{record
+ regular expressions} and \emph{record values} (for example
+ \cite{Sulzmann2014b}):
\begin{center}
@{text "r :="}
@@ -776,17 +776,17 @@
\noindent where @{text l} is a label, say a string, @{text r} a regular
expression and @{text v} a value. All functions can be smoothly extended
- to these regular expressions and values. For example \mbox{@{text "(l : r)"}} is
- nullable iff @{term r} is, and so on. The purpose of the record regular
- expression is to mark certain parts of a regular expression and then
- record in the calculated value which parts of the string were matched by
- this part. The label can then serve as classification of the tokens. For this recall the
- regular expression @{text "(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\<star>"} for keywords and
- identifiers from the Introduction. With record regular expression we can
- form \mbox{@{text "((key : r\<^bsub>key\<^esub>) + (id : r\<^bsub>id\<^esub>))\<^sup>\<star>"}} and then traverse the
- calculated value and only collect the underlying strings in record values.
- With this we obtain finite sequences of pairs of labels and strings, for
- example
+ to these regular expressions and values. For example \mbox{@{text "(l :
+ r)"}} is nullable iff @{term r} is, and so on. The purpose of the record
+ regular expression is to mark certain parts of a regular expression and
+ then record in the calculated value which parts of the string were matched
+ by this part. The label can then serve as classification for the tokens.
+ For this recall the regular expression @{text "(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\<star>"} for
+ keywords and identifiers from the Introduction. With record regular
+ expression we can form \mbox{@{text "((key : r\<^bsub>key\<^esub>) + (id : r\<^bsub>id\<^esub>))\<^sup>\<star>"}}
+ and then traverse the calculated value and only collect the underlying
+ strings in record values. With this we obtain finite sequences of pairs of
+ labels and strings, for example
\[@{text "(l\<^sub>1 : s\<^sub>1), ..., (l\<^sub>n : s\<^sub>n)"}\]
@@ -794,15 +794,15 @@
identifier-token and so on) can be extracted.
Derivatives as calculated by \Brz's method are usually more complex
- regular expressions than the initial one; the result is that the matching
- and lexing algorithms are often abysmally slow. However, various
- optimisations are possible, such as the simplifications of @{term "ALT
- ZERO r"}, @{term "ALT r ZERO"}, @{term "SEQ ONE r"} and @{term "SEQ r
- ONE"} to @{term r}. These simplifications can speed up the algorithms considerably,
- as noted in \cite{Sulzmann2014}.
- One of the advantages of having a simple specification
- and correctness proof is that the latter can be refined to prove the
- correctness of such simplification steps.
+ regular expressions than the initial one; the result is that the
+ deivative-based matching and lexing algorithms are often abysmally slow.
+ However, various optimisations are possible, such as the simplifications
+ of @{term "ALT ZERO r"}, @{term "ALT r ZERO"}, @{term "SEQ ONE r"} and
+ @{term "SEQ r ONE"} to @{term r}. These simplifications can speed up the
+ algorithms considerably, as noted in \cite{Sulzmann2014}. One of the
+ advantages of having a simple specification and correctness proof is that
+ the latter can be refined to prove the correctness of such simplification
+ steps.
While the simplification of regular expressions according to
rules like
@@ -860,11 +860,11 @@
\end{tabular}
\end{center}
- \noindent where @{term "id"} stands for the identity function.
- This function returns a simplified regular expression and a rectification
- function. Note that
- we do not simplify under stars: this seems to slow down the algorithm,
- rather than speed up. The optimised lexer is then given by the clauses:
+ \noindent where @{term "id"} stands for the identity function. This
+ function returns a simplified regular expression and a corresponding
+ rectification function. Note that we do not simplify under stars: this
+ seems to slow down the algorithm, rather than speed up. The optimised
+ lexer is then given by the clauses:
\begin{center}
\begin{tabular}{lcl}