# HG changeset patch # User Christian Urban # Date 1648030172 0 # Node ID e6248d2c20c258365081ed70a078d553ed52e619 # Parent 421397f267b9b86c7830001b92fff1191d5ab50a updated diff -r 421397f267b9 -r e6248d2c20c2 thys2/Paper/Paper.thy --- a/thys2/Paper/Paper.thy Wed Mar 23 00:09:08 2022 +0000 +++ b/thys2/Paper/Paper.thy Wed Mar 23 10:09:32 2022 +0000 @@ -102,7 +102,7 @@ Owens and Slind~\cite{Owens2008}. Another one in Isabelle/HOL is part of the work by Krauss and Nipkow \cite{Krauss2011}. And another one in Coq is given by Coquand and Siles \cite{Coquand2012}. -Also Ribeiro and Du Bois give one in Agda \cite{RibeiroAgda2017}. +Also Ribeiro and Du Bois give one in Agda~\cite{RibeiroAgda2017}. However, there are two difficulties with derivative-based matchers: @@ -231,7 +231,7 @@ The constructors $+$ and $\cdot$ represent alternatives and sequences, respectively. We sometimes omit the $\cdot$ in a sequence regular expression for brevity. The - \emph{language} of a regular expression, written $L$, is defined as usual + \emph{language} of a regular expression, written $L(r)$, is defined as usual and we omit giving the definition here (see for example \cite{AusafDyckhoffUrban2016}). Central to Brzozowski's regular expression matcher are two functions @@ -1167,13 +1167,13 @@ essentially purports that we can retrieve the same value from a simplified version of the regular expression. To start with @{text retrieve} depends on the fact that the value @{text v} correspond to the - structure of the regular expressions---but the whole point of simplification + structure of the regular expression @{text r}---but the whole point of simplification is to ``destroy'' this structure by making the regular expression simpler. To see this consider the regular expression @{text "r = r' + 0"} and a corresponding value @{text "v = Left v'"}. If we annotate bitcodes to @{text "r"}, then - we can use @{text retrieve} and @{text v} in order to extract a corresponding + we can use @{text retrieve} with @{text r} and @{text v} in order to extract a corresponding bitsequence. The reason that this works is that @{text r} is an alternative - regular expression and @{text v} a corresponding value. However, if we simplify + regular expression and @{text v} a corresponding @{text "Left"}-value. However, if we simplify @{text r}, then @{text v} does not correspond to the shape of the regular expression anymore. So unless one can somehow synchronise the change in the simplified regular expressions with @@ -1403,7 +1403,7 @@ simplification method there is a chance that the the derivatives can be kept universally small (we established in this paper that they can be kept finite for any string). This is important if one is after - an efficient POSIX lexing algorithm. + an efficient POSIX lexing algorithm based on derivatives. Having proved the correctness of the POSIX lexing algorithm, which lessons have we learned? Well, we feel this is a very good example @@ -1423,7 +1423,7 @@ \cite[Page 14]{Sulzmann2014}. Given the growth of the derivatives in some cases even after aggressive simplification, this - is a hard to believe fact. A similar claim about a theoretical runtime + is a hard to believe claim. A similar claim about a theoretical runtime of @{text "O(n\<^sup>2)"} is made for the Verbatim lexer, which calculates tokens according to POSIX rules~\cite{verbatim}. For this it uses Brzozowski's derivatives like in our work. @@ -1432,12 +1432,12 @@ While their correctness proof for Verbatim is formalised in Coq, the claim about the runtime complexity is only supported by some emperical evidence obtained by using the code extraction facilities of Coq. - In the context of our observation with the ``growth problem'' of derivatives, + Given our observation with the ``growth problem'' of derivatives, we tried out their extracted OCaml code with the example \mbox{@{text "(a + aa)\<^sup>*"}} as a single lexing rule, and it took for us around 5 minutes to tokenise a string of 40 $a$'s and that increased to approximately 19 minutes when the - string is 50 $a$'s long. Given that derivatives are not simplified in the Verbatim + string is 50 $a$'s long. Taking into account that derivatives are not simplified in the Verbatim lexer, such numbers are not surprising. Clearly our result of having finite derivatives might sound rather weak in this context but we think such effeciency claims diff -r 421397f267b9 -r e6248d2c20c2 thys2/paper.pdf Binary file thys2/paper.pdf has changed