updated
authorChristian Urban <christian.urban@kcl.ac.uk>
Wed, 23 Mar 2022 10:09:32 +0000
changeset 464 e6248d2c20c2
parent 463 421397f267b9
child 465 2e7c7111c0be
updated
thys2/Paper/Paper.thy
thys2/paper.pdf
--- a/thys2/Paper/Paper.thy	Wed Mar 23 00:09:08 2022 +0000
+++ b/thys2/Paper/Paper.thy	Wed Mar 23 10:09:32 2022 +0000
@@ -102,7 +102,7 @@
 Owens and Slind~\cite{Owens2008}. Another one in Isabelle/HOL is part
 of the work by Krauss and Nipkow \cite{Krauss2011}.  And another one
 in Coq is given by Coquand and Siles \cite{Coquand2012}.
-Also Ribeiro and Du Bois give one in Agda \cite{RibeiroAgda2017}.
+Also Ribeiro and Du Bois give one in Agda~\cite{RibeiroAgda2017}.
 
 
 However, there are two difficulties with derivative-based matchers:
@@ -231,7 +231,7 @@
   The constructors $+$ and $\cdot$ represent alternatives and sequences, respectively.
   We sometimes omit the $\cdot$ in a sequence regular expression for brevity. 
   The
-  \emph{language} of a regular expression, written $L$, is defined as usual
+  \emph{language} of a regular expression, written $L(r)$, is defined as usual
   and we omit giving the definition here (see for example \cite{AusafDyckhoffUrban2016}).
 
   Central to Brzozowski's regular expression matcher are two functions
@@ -1167,13 +1167,13 @@
      essentially purports that we can retrieve the same value from a
      simplified version of the regular expression. To start with @{text retrieve}
      depends on the fact that the value @{text v} correspond to the
-     structure of the regular expressions---but the whole point of simplification
+     structure of the regular expression @{text r}---but the whole point of simplification
      is to ``destroy'' this structure by making the regular expression simpler.
      To see this consider the regular expression @{text "r = r' + 0"} and a corresponding
      value @{text "v = Left v'"}. If we annotate bitcodes to @{text "r"}, then 
-     we can use @{text retrieve} and @{text v} in order to extract a corresponding
+     we can use @{text retrieve} with @{text r} and @{text v} in order to extract a corresponding
      bitsequence. The reason that this works is that @{text r} is an alternative
-     regular expression and @{text v} a corresponding value. However, if we simplify
+     regular expression and @{text v} a corresponding @{text "Left"}-value. However, if we simplify
      @{text r}, then @{text v} does not correspond to the shape of the regular 
      expression anymore. So unless one can somehow
      synchronise the change in the simplified regular expressions with
@@ -1403,7 +1403,7 @@
    simplification method there is a chance that the the derivatives
    can be kept universally small  (we established in this paper that
    they can be kept finite for any string). This is important if one is after
-   an efficient POSIX lexing algorithm.
+   an efficient POSIX lexing algorithm based on derivatives.
 
    Having proved the correctness of the POSIX lexing algorithm, which
    lessons have we learned? Well, we feel this is a very good example
@@ -1423,7 +1423,7 @@
    \cite[Page 14]{Sulzmann2014}. 
    Given the growth of the
    derivatives in some cases even after aggressive simplification, this
-   is a hard to believe fact. A similar claim about a theoretical runtime
+   is a hard to believe claim. A similar claim about a theoretical runtime
    of @{text "O(n\<^sup>2)"} is made for the Verbatim lexer, which calculates
    tokens according to POSIX rules~\cite{verbatim}. For this it uses Brzozowski's
    derivatives like in our work. 
@@ -1432,12 +1432,12 @@
    While their correctness proof for Verbatim is formalised in Coq, the claim about
    the runtime complexity is only supported by some emperical evidence obtained
    by using the code extraction facilities of Coq.
-   In the context of our observation with the ``growth problem'' of derivatives,
+   Given our observation with the ``growth problem'' of derivatives,
    we
    tried out their extracted OCaml code with the example
    \mbox{@{text "(a + aa)\<^sup>*"}} as a single lexing rule, and it took for us around 5 minutes to tokenise a
    string of 40 $a$'s and that increased to approximately 19 minutes when the
-   string is 50 $a$'s long. Given that derivatives are not simplified in the Verbatim
+   string is 50 $a$'s long. Taking into account that derivatives are not simplified in the Verbatim
    lexer, such numbers are not surprising. 
    Clearly our result of having finite
    derivatives might sound rather weak in this context but we think such effeciency claims
Binary file thys2/paper.pdf has changed