lexing: changeset 474:726f4e65c0fe

--- a/thys2/Paper/Paper.thy	Tue Mar 29 10:57:02 2022 +0100
+++ b/thys2/Paper/Paper.thy	Wed Mar 30 18:02:40 2022 +0100
@@ -100,7 +100,7 @@
 then @{term r} matches @{term s} (and {\em vice versa}).  We are aware
 of a mechanised correctness proof of Brzozowski's derivative-based matcher in HOL4 by
 Owens and Slind~\cite{Owens2008}. Another one in Isabelle/HOL is part
-of the work by Krauss and Nipkow \cite{Krauss2011}.  And another one
+of the work by Krauss and Nipkow~\cite{Krauss2011}.  And another one
 in Coq is given by Coquand and Siles \cite{Coquand2012}.
 Also Ribeiro and Du Bois give one in Agda~\cite{RibeiroAgda2017}.
 
@@ -161,7 +161,7 @@
 when derivatives are calculated. The compact representation
 of bitsequences and regular expressions allows them to define a more
 ``aggressive'' simplification method that keeps the size of the
-derivatives finite no matter what the length of the string is.
+derivatives finitely bounded no matter what the length of the string is.
 They make some informal claims about the correctness and linear behaviour
 of this version, but do not provide any supporting proof arguments, not
 even ``pencil-and-paper'' arguments. They write about their bitcoded
@@ -195,7 +195,7 @@
 simplifications are included.\medskip
 
 \noindent In this paper, we shall first briefly introduce the basic notions
-of regular expressions and describe the basic definitions
+of regular expressions and describe the definition
 of POSIX lexing from our earlier work \cite{AusafDyckhoffUrban2016}. This serves
 as a reference point for what correctness means in our Isabelle/HOL proofs. We shall then prove
 the correctness for the bitcoded algorithm without simplification, and
@@ -358,7 +358,7 @@
 	lexing acquired its name from the fact that the corresponding
 	rules were described as part of the POSIX specification for
 	Unix-like operating systems \cite{POSIX}.} Sometimes these
-  informal rules are called \emph{maximal much rule} and \emph{rule priority}.
+  informal rules are called \emph{maximal munch rule} and \emph{rule priority}.
   One contribution of our earlier paper is to give a convenient
  specification for what POSIX values are (the inductive rules are shown in
   Figure~\ref{POSIXrules}).
@@ -512,7 +512,7 @@
   \end{proposition}
 
   \noindent
-  In fact we have shown that in the success case the generated POSIX value $v$ is
+  In fact we have shown that, in the success case, the generated POSIX value $v$ is
   unique and in the failure case that there is no POSIX value $v$ that satisfies
   $(s, r) \rightarrow v$. While the algorithm is correct, it is excruciatingly
   slow in cases where the derivatives grow arbitrarily (recall the example from the
@@ -1166,7 +1166,7 @@
      \noindent do not hold under simplification---this property
      essentially purports that we can retrieve the same value from a
      simplified version of the regular expression. To start with @{text retrieve}
-     depends on the fact that the value @{text v} correspond to the
+     depends on the fact that the value @{text v} corresponds to the
      structure of the regular expression @{text r}---but the whole point of simplification
      is to ``destroy'' this structure by making the regular expression simpler.
      To see this consider the regular expression @{text "r = r' + 0"} and a corresponding
@@ -1300,8 +1300,8 @@
   @{thm[mode=Axiom] bs6}$A0$\qquad
   @{thm[mode=Axiom] bs7}$A1$\\
   @{thm[mode=Rule] bs8[of "rs\<^sub>1" "rs\<^sub>2"]}$AL$\\
-  @{thm[mode=Rule] ss2[of "rs\<^sub>1" "rs\<^sub>2"]}$LH$\qquad
-  @{thm[mode=Rule] ss3[of "r\<^sub>1" "r\<^sub>2"]}$LT$\\
+  @{thm[mode=Rule] ss2[of "rs\<^sub>1" "rs\<^sub>2"]}$LT$\qquad
+  @{thm[mode=Rule] ss3[of "r\<^sub>1" "r\<^sub>2"]}$LH$\\
   @{thm[mode=Axiom] ss4}$L\ZERO$\qquad
   @{thm[mode=Axiom] ss5[of "bs" "rs\<^sub>1" "rs\<^sub>2"]}$LS$\medskip\\
   @{thm[mode=Rule] ss6[of "r\<^sub>2" "r\<^sub>1" "rs\<^sub>1" "rs\<^sub>2" "rs\<^sub>3"]}$LD$\\
@@ -1400,7 +1400,7 @@
    to introduce our own definitions and proof ideas in order to establish the
    correctness.  Our interest in the second algorithm 
    lies in the fact that by using bitcoded regular expressions and an aggressive
-   simplification method there is a chance that the the derivatives
+   simplification method there is a chance that the derivatives
    can be kept universally small  (we established in this paper that
    they can be kept finite for any string). This is important if one is after
    an efficient POSIX lexing algorithm based on derivatives.
@@ -1425,9 +1425,9 @@
    derivatives in some cases even after aggressive simplification, this
    is a hard to believe claim. A similar claim about a theoretical runtime
    of @{text "O(n\<^sup>2)"} is made for the Verbatim lexer, which calculates
-   tokens according to POSIX rules~\cite{verbatim}. For this it uses Brzozowski's
+   tokens according to POSIX rules~\cite{verbatim}. For this Verbatim uses Brzozowski's
    derivatives like in our work. 
-   They write: ``The results of our empirical tests [..] confirm that Verbatim has
+   The authors write: ``The results of our empirical tests [..] confirm that Verbatim has
    @{text "O(n\<^sup>2)"} time complexity.'' \cite[Section~VII]{verbatim}.
    While their correctness proof for Verbatim is formalised in Coq, the claim about
    the runtime complexity is only supported by some emperical evidence obtained

--- a/thys2/Paper/document/root.tex	Tue Mar 29 10:57:02 2022 +0100
+++ b/thys2/Paper/document/root.tex	Wed Mar 30 18:02:40 2022 +0100
@@ -85,7 +85,7 @@
   The purpose of the bitcodes is to generate POSIX values incrementally while
   derivatives are calculated. They also help with designing
   an ``aggressive'' simplification function that keeps the size of
-  derivatives finite. Without simplification the size of some derivatives can grow
+  derivatives finitely bounded. Without simplification the size of some derivatives can grow
   arbitrarily big resulting in an extremely slow lexing algorithm.  In this
   paper we describe a variant of Sulzmann and Lu's algorithm: Our
   variant is a recursive functional program, whereas Sulzmann

Binary file thys2/paper.pdf has changed

author	Christian Urban <christian.urban@kcl.ac.uk>
	Wed, 30 Mar 2022 18:02:40 +0100
changeset 474	726f4e65c0fe
parent 473	37d14cbce020
child 477	ab6aaf8d7649

thys2/Paper/Paper.thy		file \| annotate \| diff \| comparison \| revisions
thys2/Paper/document/root.tex		file \| annotate \| diff \| comparison \| revisions
thys2/paper.pdf		file \| annotate \| diff \| comparison \| revisions