--- a/thys2/Paper/Paper.thy Tue Mar 29 10:57:02 2022 +0100
+++ b/thys2/Paper/Paper.thy Wed Mar 30 18:02:40 2022 +0100
@@ -100,7 +100,7 @@
then @{term r} matches @{term s} (and {\em vice versa}). We are aware
of a mechanised correctness proof of Brzozowski's derivative-based matcher in HOL4 by
Owens and Slind~\cite{Owens2008}. Another one in Isabelle/HOL is part
-of the work by Krauss and Nipkow \cite{Krauss2011}. And another one
+of the work by Krauss and Nipkow~\cite{Krauss2011}. And another one
in Coq is given by Coquand and Siles \cite{Coquand2012}.
Also Ribeiro and Du Bois give one in Agda~\cite{RibeiroAgda2017}.
@@ -161,7 +161,7 @@
when derivatives are calculated. The compact representation
of bitsequences and regular expressions allows them to define a more
``aggressive'' simplification method that keeps the size of the
-derivatives finite no matter what the length of the string is.
+derivatives finitely bounded no matter what the length of the string is.
They make some informal claims about the correctness and linear behaviour
of this version, but do not provide any supporting proof arguments, not
even ``pencil-and-paper'' arguments. They write about their bitcoded
@@ -195,7 +195,7 @@
simplifications are included.\medskip
\noindent In this paper, we shall first briefly introduce the basic notions
-of regular expressions and describe the basic definitions
+of regular expressions and describe the definition
of POSIX lexing from our earlier work \cite{AusafDyckhoffUrban2016}. This serves
as a reference point for what correctness means in our Isabelle/HOL proofs. We shall then prove
the correctness for the bitcoded algorithm without simplification, and
@@ -358,7 +358,7 @@
lexing acquired its name from the fact that the corresponding
rules were described as part of the POSIX specification for
Unix-like operating systems \cite{POSIX}.} Sometimes these
- informal rules are called \emph{maximal much rule} and \emph{rule priority}.
+ informal rules are called \emph{maximal munch rule} and \emph{rule priority}.
One contribution of our earlier paper is to give a convenient
specification for what POSIX values are (the inductive rules are shown in
Figure~\ref{POSIXrules}).
@@ -512,7 +512,7 @@
\end{proposition}
\noindent
- In fact we have shown that in the success case the generated POSIX value $v$ is
+ In fact we have shown that, in the success case, the generated POSIX value $v$ is
unique and in the failure case that there is no POSIX value $v$ that satisfies
$(s, r) \rightarrow v$. While the algorithm is correct, it is excruciatingly
slow in cases where the derivatives grow arbitrarily (recall the example from the
@@ -1166,7 +1166,7 @@
\noindent do not hold under simplification---this property
essentially purports that we can retrieve the same value from a
simplified version of the regular expression. To start with @{text retrieve}
- depends on the fact that the value @{text v} correspond to the
+ depends on the fact that the value @{text v} corresponds to the
structure of the regular expression @{text r}---but the whole point of simplification
is to ``destroy'' this structure by making the regular expression simpler.
To see this consider the regular expression @{text "r = r' + 0"} and a corresponding
@@ -1300,8 +1300,8 @@
@{thm[mode=Axiom] bs6}$A0$\qquad
@{thm[mode=Axiom] bs7}$A1$\\
@{thm[mode=Rule] bs8[of "rs\<^sub>1" "rs\<^sub>2"]}$AL$\\
- @{thm[mode=Rule] ss2[of "rs\<^sub>1" "rs\<^sub>2"]}$LH$\qquad
- @{thm[mode=Rule] ss3[of "r\<^sub>1" "r\<^sub>2"]}$LT$\\
+ @{thm[mode=Rule] ss2[of "rs\<^sub>1" "rs\<^sub>2"]}$LT$\qquad
+ @{thm[mode=Rule] ss3[of "r\<^sub>1" "r\<^sub>2"]}$LH$\\
@{thm[mode=Axiom] ss4}$L\ZERO$\qquad
@{thm[mode=Axiom] ss5[of "bs" "rs\<^sub>1" "rs\<^sub>2"]}$LS$\medskip\\
@{thm[mode=Rule] ss6[of "r\<^sub>2" "r\<^sub>1" "rs\<^sub>1" "rs\<^sub>2" "rs\<^sub>3"]}$LD$\\
@@ -1400,7 +1400,7 @@
to introduce our own definitions and proof ideas in order to establish the
correctness. Our interest in the second algorithm
lies in the fact that by using bitcoded regular expressions and an aggressive
- simplification method there is a chance that the the derivatives
+ simplification method there is a chance that the derivatives
can be kept universally small (we established in this paper that
they can be kept finite for any string). This is important if one is after
an efficient POSIX lexing algorithm based on derivatives.
@@ -1425,9 +1425,9 @@
derivatives in some cases even after aggressive simplification, this
is a hard to believe claim. A similar claim about a theoretical runtime
of @{text "O(n\<^sup>2)"} is made for the Verbatim lexer, which calculates
- tokens according to POSIX rules~\cite{verbatim}. For this it uses Brzozowski's
+ tokens according to POSIX rules~\cite{verbatim}. For this Verbatim uses Brzozowski's
derivatives like in our work.
- They write: ``The results of our empirical tests [..] confirm that Verbatim has
+ The authors write: ``The results of our empirical tests [..] confirm that Verbatim has
@{text "O(n\<^sup>2)"} time complexity.'' \cite[Section~VII]{verbatim}.
While their correctness proof for Verbatim is formalised in Coq, the claim about
the runtime complexity is only supported by some emperical evidence obtained
--- a/thys2/Paper/document/root.tex Tue Mar 29 10:57:02 2022 +0100
+++ b/thys2/Paper/document/root.tex Wed Mar 30 18:02:40 2022 +0100
@@ -85,7 +85,7 @@
The purpose of the bitcodes is to generate POSIX values incrementally while
derivatives are calculated. They also help with designing
an ``aggressive'' simplification function that keeps the size of
- derivatives finite. Without simplification the size of some derivatives can grow
+ derivatives finitely bounded. Without simplification the size of some derivatives can grow
arbitrarily big resulting in an extremely slow lexing algorithm. In this
paper we describe a variant of Sulzmann and Lu's algorithm: Our
variant is a recursive functional program, whereas Sulzmann
Binary file thys2/paper.pdf has changed