# HG changeset patch # User Chengsong # Date 1648851480 -3600 # Node ID ab6aaf8d7649de4dc7b8610825fbc509f902f3f1 # Parent 56837303ce61cfa065a7af3245f2e22af54a619c# Parent 726f4e65c0feed933cbd1658ee9acc60d27f570c out diff -r 56837303ce61 -r ab6aaf8d7649 thys2/Paper/Paper.thy --- a/thys2/Paper/Paper.thy Fri Apr 01 23:17:40 2022 +0100 +++ b/thys2/Paper/Paper.thy Fri Apr 01 23:18:00 2022 +0100 @@ -100,7 +100,7 @@ then @{term r} matches @{term s} (and {\em vice versa}). We are aware of a mechanised correctness proof of Brzozowski's derivative-based matcher in HOL4 by Owens and Slind~\cite{Owens2008}. Another one in Isabelle/HOL is part -of the work by Krauss and Nipkow \cite{Krauss2011}. And another one +of the work by Krauss and Nipkow~\cite{Krauss2011}. And another one in Coq is given by Coquand and Siles \cite{Coquand2012}. Also Ribeiro and Du Bois give one in Agda~\cite{RibeiroAgda2017}. @@ -161,7 +161,7 @@ when derivatives are calculated. The compact representation of bitsequences and regular expressions allows them to define a more ``aggressive'' simplification method that keeps the size of the -derivatives finite no matter what the length of the string is. +derivatives finitely bounded no matter what the length of the string is. They make some informal claims about the correctness and linear behaviour of this version, but do not provide any supporting proof arguments, not even ``pencil-and-paper'' arguments. They write about their bitcoded @@ -195,7 +195,7 @@ simplifications are included.\medskip \noindent In this paper, we shall first briefly introduce the basic notions -of regular expressions and describe the basic definitions +of regular expressions and describe the definition of POSIX lexing from our earlier work \cite{AusafDyckhoffUrban2016}. This serves as a reference point for what correctness means in our Isabelle/HOL proofs. We shall then prove the correctness for the bitcoded algorithm without simplification, and @@ -358,7 +358,7 @@ lexing acquired its name from the fact that the corresponding rules were described as part of the POSIX specification for Unix-like operating systems \cite{POSIX}.} Sometimes these - informal rules are called \emph{maximal much rule} and \emph{rule priority}. + informal rules are called \emph{maximal munch rule} and \emph{rule priority}. One contribution of our earlier paper is to give a convenient specification for what POSIX values are (the inductive rules are shown in Figure~\ref{POSIXrules}). @@ -512,7 +512,7 @@ \end{proposition} \noindent - In fact we have shown that in the success case the generated POSIX value $v$ is + In fact we have shown that, in the success case, the generated POSIX value $v$ is unique and in the failure case that there is no POSIX value $v$ that satisfies $(s, r) \rightarrow v$. While the algorithm is correct, it is excruciatingly slow in cases where the derivatives grow arbitrarily (recall the example from the @@ -1166,7 +1166,7 @@ \noindent do not hold under simplification---this property essentially purports that we can retrieve the same value from a simplified version of the regular expression. To start with @{text retrieve} - depends on the fact that the value @{text v} correspond to the + depends on the fact that the value @{text v} corresponds to the structure of the regular expression @{text r}---but the whole point of simplification is to ``destroy'' this structure by making the regular expression simpler. To see this consider the regular expression @{text "r = r' + 0"} and a corresponding @@ -1300,8 +1300,8 @@ @{thm[mode=Axiom] bs6}$A0$\qquad @{thm[mode=Axiom] bs7}$A1$\\ @{thm[mode=Rule] bs8[of "rs\<^sub>1" "rs\<^sub>2"]}$AL$\\ - @{thm[mode=Rule] ss2[of "rs\<^sub>1" "rs\<^sub>2"]}$LH$\qquad - @{thm[mode=Rule] ss3[of "r\<^sub>1" "r\<^sub>2"]}$LT$\\ + @{thm[mode=Rule] ss2[of "rs\<^sub>1" "rs\<^sub>2"]}$LT$\qquad + @{thm[mode=Rule] ss3[of "r\<^sub>1" "r\<^sub>2"]}$LH$\\ @{thm[mode=Axiom] ss4}$L\ZERO$\qquad @{thm[mode=Axiom] ss5[of "bs" "rs\<^sub>1" "rs\<^sub>2"]}$LS$\medskip\\ @{thm[mode=Rule] ss6[of "r\<^sub>2" "r\<^sub>1" "rs\<^sub>1" "rs\<^sub>2" "rs\<^sub>3"]}$LD$\\ @@ -1400,7 +1400,7 @@ to introduce our own definitions and proof ideas in order to establish the correctness. Our interest in the second algorithm lies in the fact that by using bitcoded regular expressions and an aggressive - simplification method there is a chance that the the derivatives + simplification method there is a chance that the derivatives can be kept universally small (we established in this paper that they can be kept finite for any string). This is important if one is after an efficient POSIX lexing algorithm based on derivatives. @@ -1425,9 +1425,9 @@ derivatives in some cases even after aggressive simplification, this is a hard to believe claim. A similar claim about a theoretical runtime of @{text "O(n\<^sup>2)"} is made for the Verbatim lexer, which calculates - tokens according to POSIX rules~\cite{verbatim}. For this it uses Brzozowski's + tokens according to POSIX rules~\cite{verbatim}. For this Verbatim uses Brzozowski's derivatives like in our work. - They write: ``The results of our empirical tests [..] confirm that Verbatim has + The authors write: ``The results of our empirical tests [..] confirm that Verbatim has @{text "O(n\<^sup>2)"} time complexity.'' \cite[Section~VII]{verbatim}. While their correctness proof for Verbatim is formalised in Coq, the claim about the runtime complexity is only supported by some emperical evidence obtained diff -r 56837303ce61 -r ab6aaf8d7649 thys2/Paper/document/root.tex --- a/thys2/Paper/document/root.tex Fri Apr 01 23:17:40 2022 +0100 +++ b/thys2/Paper/document/root.tex Fri Apr 01 23:18:00 2022 +0100 @@ -85,7 +85,7 @@ The purpose of the bitcodes is to generate POSIX values incrementally while derivatives are calculated. They also help with designing an ``aggressive'' simplification function that keeps the size of - derivatives finite. Without simplification the size of some derivatives can grow + derivatives finitely bounded. Without simplification the size of some derivatives can grow arbitrarily big resulting in an extremely slow lexing algorithm. In this paper we describe a variant of Sulzmann and Lu's algorithm: Our variant is a recursive functional program, whereas Sulzmann diff -r 56837303ce61 -r ab6aaf8d7649 thys2/paper.pdf Binary file thys2/paper.pdf has changed