lexing: comparison thys/Paper/Paper.thy

equal deleted inserted replaced

-:da81ffac4b10
+:71f4ecc08849
 \noindent From the latter we can infer by Prop.~\ref{derprop}(2):
 \[@{term "\<not> (\<exists>s\<^sub>3 s\<^sub>4. s\<^sub>3 \<noteq> [] \<and> s\<^sub>3 @ s\<^sub>4 = s\<^sub>2 \<and> (c # s\<^sub>1) @ s\<^sub>3 \<in> L r\<^sub>1 \<and> s\<^sub>4 \<in> L r\<^sub>2)"}\]
 \noindent We can use the induction hypothesis for @{text "r\<^sub>1"} to obtain
-@{term "(c # s\<^sub>1) \<in> r\<^sub>1 \<rightarrow> injval r\<^sub>1 c v\<^sub>1"}. This all put together allows us to infer
+@{term "(c # s\<^sub>1) \<in> r\<^sub>1 \<rightarrow> injval r\<^sub>1 c v\<^sub>1"}. Putting this all together allows us to infer
 @{term "((c # s\<^sub>1) @ s\<^sub>2) \<in> SEQ r\<^sub>1 r\<^sub>2 \<rightarrow> Seq (injval r\<^sub>1 c v\<^sub>1) v\<^sub>2"}. The case @{text "(c)"}
 is similarly.
 For @{text "(b)"} we know @{term "s \<in> der c r\<^sub>2 \<rightarrow> v\<^sub>1"} and
 @{term "s\<^sub>1 @ s\<^sub>2 \<notin> L (SEQ (der c r\<^sub>1) r\<^sub>2)"}. From the former
 \noindent By Lem.~\ref{lemmkeps} we know @{term "[] \<in> r\<^sub>1 \<rightarrow> (mkeps r\<^sub>1)"}
 holds. Putting this all together, we can conclude with @{term "(c #
 s) \<in> SEQ r\<^sub>1 r\<^sub>2 \<rightarrow> Seq (mkeps r\<^sub>1) (injval r\<^sub>2 c v\<^sub>1)"}, as required.
 Finally suppose @{term "r = STAR r\<^sub>1"}. This case is very similar to the
-sequence case, except that we need to ensure that @{term "flat (injval r\<^sub>1
+sequence case, except that we need to also ensure that @{term "flat (injval r\<^sub>1
 c v\<^sub>1) \<noteq> []"}. This follows from @{term "(c # s\<^sub>1)
 \<in> r\<^sub>1 \<rightarrow> injval r\<^sub>1 c v\<^sub>1"}  (which in turn follows from @{term "s\<^sub>1 \<in> der c
 r\<^sub>1 \<rightarrow> v\<^sub>1"} and the induction hypothesis).\qed
 \end{proof}
 \begin{proof}
 By induction on @{term s} using Lem.~\ref{lemmkeps} and \ref{Posix2}.\qed
 \end{proof}
 \noindent This concludes our correctness proof. Note that we have not
-changed the algorithm by Sulzmann and Lu,\footnote{Any deviation we introduced is harmless.}
+changed the algorithm of Sulzmann and Lu,\footnote{Any deviation we
-but introduced our own
+introduced is harmless.} but introduced our own specification for what a
-specification for what a correct result---a POSIX value---should be.
+correct result---a POSIX value---should be. A strong point in favour of
-A strong point in favour of Sulzmann and Lu's algorithm is that it
+Sulzmann and Lu's algorithm is that it can be extended in various ways.
-can be extended in various ways.
 *}
 section {* Extensions and Optimisations*}
 text {*
-If we are interested in tokenising strings, then we need to not just
+If we are interested in tokenising a string, then we need to not just
 split up the string into tokens, but also ``classify'' the tokens (for
-example whether it is a keyword or an identifier). This can be
+example whether it is a keyword or an identifier). This can be done with
-done with only minor modifications to the algorithm by introducing \emph{record regular
+only minor modifications to the algorithm by introducing \emph{record
-expressions} and \emph{record values} (for example \cite{Sulzmann2014b}):
+regular expressions} and \emph{record values} (for example
+\cite{Sulzmann2014b}):
 \begin{center}
 @{text "r :="}
 @{text "..."} $\mid$
 @{text "(l : r)"} \qquad\qquad
 @{text "(l : v)"}
 \end{center}
 \noindent where @{text l} is a label, say a string, @{text r} a regular
 expression and @{text v} a value. All functions can be smoothly extended
-to these regular expressions and values. For example \mbox{@{text "(l : r)"}} is
+to these regular expressions and values. For example \mbox{@{text "(l :
-nullable iff @{term r} is, and so on. The purpose of the record regular
+r)"}} is nullable iff @{term r} is, and so on. The purpose of the record
-expression is to mark certain parts of a regular expression and then
+regular expression is to mark certain parts of a regular expression and
-record in the calculated value which parts of the string were matched by
+then record in the calculated value which parts of the string were matched
-this part. The label can then serve as classification of the tokens. For this recall the
+by this part. The label can then serve as classification for the tokens.
-regular expression @{text "(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\<star>"} for keywords and
+For this recall the regular expression @{text "(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\<star>"} for
-identifiers from the Introduction. With record regular expression we can
+keywords and identifiers from the Introduction. With record regular
-form \mbox{@{text "((key : r\<^bsub>key\<^esub>) + (id : r\<^bsub>id\<^esub>))\<^sup>\<star>"}} and then traverse the
+expression we can form \mbox{@{text "((key : r\<^bsub>key\<^esub>) + (id : r\<^bsub>id\<^esub>))\<^sup>\<star>"}}
-calculated value and only collect the underlying strings in record values.
+and then traverse the calculated value and only collect the underlying
-With this we obtain finite sequences of pairs of labels and strings, for
+strings in record values. With this we obtain finite sequences of pairs of
-example
+labels and strings, for example
 \[@{text "(l\<^sub>1 : s\<^sub>1), ..., (l\<^sub>n : s\<^sub>n)"}\]
 \noindent from which tokens with classifications (keyword-token,
 identifier-token and so on) can be extracted.
 Derivatives as calculated by \Brz's method are usually more complex
-regular expressions than the initial one; the result is that the matching
+regular expressions than the initial one; the result is that the
-and lexing algorithms are often abysmally slow. However, various
+deivative-based matching and lexing algorithms are often abysmally slow.
-optimisations are possible, such as the simplifications of @{term "ALT
+However, various optimisations are possible, such as the simplifications
-ZERO r"}, @{term "ALT r ZERO"}, @{term "SEQ ONE r"} and @{term "SEQ r
+of @{term "ALT ZERO r"}, @{term "ALT r ZERO"}, @{term "SEQ ONE r"} and
-ONE"} to @{term r}. These simplifications can speed up the algorithms considerably,
+@{term "SEQ r ONE"} to @{term r}. These simplifications can speed up the
-as noted in \cite{Sulzmann2014}.
+algorithms considerably, as noted in \cite{Sulzmann2014}. One of the
-One of the advantages of having a simple specification
+advantages of having a simple specification and correctness proof is that
-and correctness proof is that the latter can be refined to prove the
+the latter can be refined to prove the correctness of such simplification
-correctness of such simplification steps.
+steps.
 While the simplification of regular expressions according to
 rules like
 \begin{center}
 @{term "simp (SEQ r\<^sub>1 r\<^sub>2)"} & $\dn$ & @{term "simp_SEQ (simp r\<^sub>1) (simp r\<^sub>2)"}\\
 @{term "simp r"} & $\dn$ & @{term "(r, id)"}\\
 \end{tabular}
 \end{center}
-\noindent where @{term "id"} stands for the identity function.
+\noindent where @{term "id"} stands for the identity function. This
-This function returns a simplified regular expression and a rectification
+function returns a simplified regular expression and a corresponding
-function. Note that
+rectification function. Note that we do not simplify under stars: this
-we do not simplify under stars: this seems to slow down the algorithm,
+seems to slow down the algorithm, rather than speed up. The optimised
-rather than speed up. The optimised lexer is then given by the clauses:
+lexer is then given by the clauses:
 \begin{center}
 \begin{tabular}{lcl}
 @{thm (lhs) slexer.simps(1)} & $\dn$ & @{thm (rhs) slexer.simps(1)}\\
 @{thm (lhs) slexer.simps(2)} & $\dn$ &

changeset 147	71f4ecc08849
parent 146	da81ffac4b10
child 148	702ed601349b