lexing: comparison thys2/Paper/Paper.thy

equal deleted inserted replaced

-:37d14cbce020
+:726f4e65c0fe
 expression @{term r}: if the derivative of @{term r} w.r.t.\ (in
 succession) all the characters of the string matches the empty string,
 then @{term r} matches @{term s} (and {\em vice versa}).  We are aware
 of a mechanised correctness proof of Brzozowski's derivative-based matcher in HOL4 by
 Owens and Slind~\cite{Owens2008}. Another one in Isabelle/HOL is part
-of the work by Krauss and Nipkow \cite{Krauss2011}.  And another one
+of the work by Krauss and Nipkow~\cite{Krauss2011}.  And another one
 in Coq is given by Coquand and Siles \cite{Coquand2012}.
 Also Ribeiro and Du Bois give one in Agda~\cite{RibeiroAgda2017}.
 However, there are two difficulties with derivative-based matchers:
 regular expressions. In this version, POSIX values are
 represented as bitsequences and such sequences are incrementally generated
 when derivatives are calculated. The compact representation
 of bitsequences and regular expressions allows them to define a more
 ``aggressive'' simplification method that keeps the size of the
-derivatives finite no matter what the length of the string is.
+derivatives finitely bounded no matter what the length of the string is.
 They make some informal claims about the correctness and linear behaviour
 of this version, but do not provide any supporting proof arguments, not
 even ``pencil-and-paper'' arguments. They write about their bitcoded
 \emph{incremental parsing method} (that is the algorithm to be formalised
 in this paper):
 We shall also introduce our own argument and definitions for
 establishing the correctness of the bitcoded algorithm when
 simplifications are included.\medskip
 \noindent In this paper, we shall first briefly introduce the basic notions
-of regular expressions and describe the basic definitions
+of regular expressions and describe the definition
 of POSIX lexing from our earlier work \cite{AusafDyckhoffUrban2016}. This serves
 as a reference point for what correctness means in our Isabelle/HOL proofs. We shall then prove
 the correctness for the bitcoded algorithm without simplification, and
 after that extend the proof to include simplification.
 string that satisfies the informal POSIX rules (see
 \cite{POSIX,Kuklewicz,OkuiSuzuki2010,Sulzmann2014,Vansummeren2006}).\footnote{POSIX
 	lexing acquired its name from the fact that the corresponding
 	rules were described as part of the POSIX specification for
 	Unix-like operating systems \cite{POSIX}.} Sometimes these
-informal rules are called \emph{maximal much rule} and \emph{rule priority}.
+informal rules are called \emph{maximal munch rule} and \emph{rule priority}.
 One contribution of our earlier paper is to give a convenient
 specification for what POSIX values are (the inductive rules are shown in
 Figure~\ref{POSIXrules}).
 \begin{figure}[t]
 (2) & @{thm (lhs) lexer_correct_Some} if and only if @{thm (rhs) lexer_correct_Some}\\
 \end{tabular}
 \end{proposition}
 \noindent
-In fact we have shown that in the success case the generated POSIX value $v$ is
+In fact we have shown that, in the success case, the generated POSIX value $v$ is
 unique and in the failure case that there is no POSIX value $v$ that satisfies
 $(s, r) \rightarrow v$. While the algorithm is correct, it is excruciatingly
 slow in cases where the derivatives grow arbitrarily (recall the example from the
 Introduction). However it can be used as a convenient reference point for the correctness
 proof of the second algorithm by Sulzmann and Lu, which we shall describe next.
 \]
 \noindent do not hold under simplification---this property
 essentially purports that we can retrieve the same value from a
 simplified version of the regular expression. To start with @{text retrieve}
-depends on the fact that the value @{text v} correspond to the
+depends on the fact that the value @{text v} corresponds to the
 structure of the regular expression @{text r}---but the whole point of simplification
 is to ``destroy'' this structure by making the regular expression simpler.
 To see this consider the regular expression @{text "r = r' + 0"} and a corresponding
 value @{text "v = Left v'"}. If we annotate bitcodes to @{text "r"}, then
 we can use @{text retrieve} with @{text r} and @{text v} in order to extract a corresponding
 @{thm[mode=Rule] bs4[of "r\<^sub>1" "r\<^sub>2" _ "r\<^sub>3"]}SL\qquad
 @{thm[mode=Rule] bs5[of "r\<^sub>3" "r\<^sub>4" _ "r\<^sub>1"]}SR\\
 @{thm[mode=Axiom] bs6}$A0$\qquad
 @{thm[mode=Axiom] bs7}$A1$\\
 @{thm[mode=Rule] bs8[of "rs\<^sub>1" "rs\<^sub>2"]}$AL$\\
-@{thm[mode=Rule] ss2[of "rs\<^sub>1" "rs\<^sub>2"]}$LH$\qquad
+@{thm[mode=Rule] ss2[of "rs\<^sub>1" "rs\<^sub>2"]}$LT$\qquad
-@{thm[mode=Rule] ss3[of "r\<^sub>1" "r\<^sub>2"]}$LT$\\
+@{thm[mode=Rule] ss3[of "r\<^sub>1" "r\<^sub>2"]}$LH$\\
 @{thm[mode=Axiom] ss4}$L\ZERO$\qquad
 @{thm[mode=Axiom] ss5[of "bs" "rs\<^sub>1" "rs\<^sub>2"]}$LS$\medskip\\
 @{thm[mode=Rule] ss6[of "r\<^sub>2" "r\<^sub>1" "rs\<^sub>1" "rs\<^sub>2" "rs\<^sub>3"]}$LD$\\
 \end{tabular}
 \end{center}
 because the informal definition given by Sulzmann and Lu did not
 stand up to a formal proof. Also for the second algorithm we needed
 to introduce our own definitions and proof ideas in order to establish the
 correctness.  Our interest in the second algorithm
 lies in the fact that by using bitcoded regular expressions and an aggressive
-simplification method there is a chance that the the derivatives
+simplification method there is a chance that the derivatives
 can be kept universally small  (we established in this paper that
 they can be kept finite for any string). This is important if one is after
 an efficient POSIX lexing algorithm based on derivatives.
 Having proved the correctness of the POSIX lexing algorithm, which
 \cite[Page 14]{Sulzmann2014}.
 Given the growth of the
 derivatives in some cases even after aggressive simplification, this
 is a hard to believe claim. A similar claim about a theoretical runtime
 of @{text "O(n\<^sup>2)"} is made for the Verbatim lexer, which calculates
-tokens according to POSIX rules~\cite{verbatim}. For this it uses Brzozowski's
+tokens according to POSIX rules~\cite{verbatim}. For this Verbatim uses Brzozowski's
 derivatives like in our work.
-They write: ``The results of our empirical tests [..] confirm that Verbatim has
+The authors write: ``The results of our empirical tests [..] confirm that Verbatim has
 @{text "O(n\<^sup>2)"} time complexity.'' \cite[Section~VII]{verbatim}.
 While their correctness proof for Verbatim is formalised in Coq, the claim about
 the runtime complexity is only supported by some emperical evidence obtained
 by using the code extraction facilities of Coq.
 Given our observation with the ``growth problem'' of derivatives,

changeset 474	726f4e65c0fe
parent 464	e6248d2c20c2