--- a/thys2/Paper/Paper.thy Wed Mar 23 00:09:08 2022 +0000
+++ b/thys2/Paper/Paper.thy Wed Mar 23 10:09:32 2022 +0000
@@ -102,7 +102,7 @@
Owens and Slind~\cite{Owens2008}. Another one in Isabelle/HOL is part
of the work by Krauss and Nipkow \cite{Krauss2011}. And another one
in Coq is given by Coquand and Siles \cite{Coquand2012}.
-Also Ribeiro and Du Bois give one in Agda \cite{RibeiroAgda2017}.
+Also Ribeiro and Du Bois give one in Agda~\cite{RibeiroAgda2017}.
However, there are two difficulties with derivative-based matchers:
@@ -231,7 +231,7 @@
The constructors $+$ and $\cdot$ represent alternatives and sequences, respectively.
We sometimes omit the $\cdot$ in a sequence regular expression for brevity.
The
- \emph{language} of a regular expression, written $L$, is defined as usual
+ \emph{language} of a regular expression, written $L(r)$, is defined as usual
and we omit giving the definition here (see for example \cite{AusafDyckhoffUrban2016}).
Central to Brzozowski's regular expression matcher are two functions
@@ -1167,13 +1167,13 @@
essentially purports that we can retrieve the same value from a
simplified version of the regular expression. To start with @{text retrieve}
depends on the fact that the value @{text v} correspond to the
- structure of the regular expressions---but the whole point of simplification
+ structure of the regular expression @{text r}---but the whole point of simplification
is to ``destroy'' this structure by making the regular expression simpler.
To see this consider the regular expression @{text "r = r' + 0"} and a corresponding
value @{text "v = Left v'"}. If we annotate bitcodes to @{text "r"}, then
- we can use @{text retrieve} and @{text v} in order to extract a corresponding
+ we can use @{text retrieve} with @{text r} and @{text v} in order to extract a corresponding
bitsequence. The reason that this works is that @{text r} is an alternative
- regular expression and @{text v} a corresponding value. However, if we simplify
+ regular expression and @{text v} a corresponding @{text "Left"}-value. However, if we simplify
@{text r}, then @{text v} does not correspond to the shape of the regular
expression anymore. So unless one can somehow
synchronise the change in the simplified regular expressions with
@@ -1403,7 +1403,7 @@
simplification method there is a chance that the the derivatives
can be kept universally small (we established in this paper that
they can be kept finite for any string). This is important if one is after
- an efficient POSIX lexing algorithm.
+ an efficient POSIX lexing algorithm based on derivatives.
Having proved the correctness of the POSIX lexing algorithm, which
lessons have we learned? Well, we feel this is a very good example
@@ -1423,7 +1423,7 @@
\cite[Page 14]{Sulzmann2014}.
Given the growth of the
derivatives in some cases even after aggressive simplification, this
- is a hard to believe fact. A similar claim about a theoretical runtime
+ is a hard to believe claim. A similar claim about a theoretical runtime
of @{text "O(n\<^sup>2)"} is made for the Verbatim lexer, which calculates
tokens according to POSIX rules~\cite{verbatim}. For this it uses Brzozowski's
derivatives like in our work.
@@ -1432,12 +1432,12 @@
While their correctness proof for Verbatim is formalised in Coq, the claim about
the runtime complexity is only supported by some emperical evidence obtained
by using the code extraction facilities of Coq.
- In the context of our observation with the ``growth problem'' of derivatives,
+ Given our observation with the ``growth problem'' of derivatives,
we
tried out their extracted OCaml code with the example
\mbox{@{text "(a + aa)\<^sup>*"}} as a single lexing rule, and it took for us around 5 minutes to tokenise a
string of 40 $a$'s and that increased to approximately 19 minutes when the
- string is 50 $a$'s long. Given that derivatives are not simplified in the Verbatim
+ string is 50 $a$'s long. Taking into account that derivatives are not simplified in the Verbatim
lexer, such numbers are not surprising.
Clearly our result of having finite
derivatives might sound rather weak in this context but we think such effeciency claims
Binary file thys2/paper.pdf has changed