--- a/thys3/Paper.thy Thu Feb 16 23:23:22 2023 +0000
+++ b/thys3/Paper.thy Mon Feb 20 23:40:30 2023 +0000
@@ -216,7 +216,10 @@
of POSIX lexing from our earlier work \cite{AusafDyckhoffUrban2016}. This serves
as a reference point for what correctness means in our Isabelle/HOL proofs. We shall then prove
the correctness for the bitcoded algorithm without simplification, and
-after that extend the proof to include simplification.\mbox{}\\[-6mm]
+after that extend the proof to include simplification.
+Our Isabelle code including the results from Sec.~5 is available
+from \textcolor{darkblue}{\url{https://github.com/urbanchr/posix}}.
+\mbox{}\\[-6mm]
*}
@@ -256,7 +259,7 @@
In our work here we also add to the usual ``basic'' regular
expressions the \emph{bounded} regular expression @{term "NTIMES r
n"} where the @{term n} specifies that @{term r} should match
- exactly @{term n}-times. Again for brevity we omit the other bounded
+ exactly @{term n}-times (it is not included in Sulzmann and Lu's original work). For brevity we omit the other bounded
regular expressions @{text "r"}$^{\{..n\}}$, @{text "r"}$^{\{n..\}}$
and @{text "r"}$^{\{n..m\}}$ which specify intervals for how many
times @{text r} should match. The results presented in this paper
@@ -278,18 +281,18 @@
adhoc limits for bounded regular expressions: For example in the
regular expression matching library in the Go language and also in Google's RE2 library the regular expression
@{term "NTIMES a 1001"} is not permitted, because no counter can be
- above 1000; and in the built-in regular expression library in Rust
+ above 1000; and in the regular expression library in Rust
expressions such as @{text "a\<^bsup>{1000}{100}{5}\<^esup>"} give an error
message for being too big. Up until recently,\footnote{up until version 1.5.4 of the regex
library in Rust; see also CVE-2022-24713.} Rust
- however happily generated an automaton for the regular expression
+ however happily generated automata for regular expressions such as
@{text "a\<^bsup>{0}{4294967295}\<^esup>"}. This was due to a bug
in the algorithm that decides when a regular expression is acceptable
- or too big according to Rust's classification. We shall come back to
+ or too big according to Rust's classification (it did not account for the fact that @{text "a\<^bsup>{0}\<^esup>"} and similar examples can match the empty string). We shall come back to
this example later in the paper.
These problems can of course be solved in matching
algorithms where automata go beyond the classic notion and for
- instance include explicit counters (see for example~\cite{CountingSet2020}).
+ instance include explicit counters (e.g.~\cite{CountingSet2020}).
The point here is that Brzozowski derivatives and the algorithms by
Sulzmann and Lu can be straightforwardly extended to deal with
bounded regular expressions and moreover the resulting code
@@ -874,7 +877,7 @@
bitcoded regular expressions.
%
\begin{center}
- \begin{tabular}{@ {\hspace{-1mm}}c@ {\hspace{6mm}}c@ {}}
+ \begin{tabular}{@ {\hspace{-1mm}}c@ {\hspace{10mm}}c@ {}}
\begin{tabular}{@ {}l@ {\hspace{0.5mm}}c@ {\hspace{1mm}}l}
$\textit{bnullable}\,(\textit{ZERO})$ & $\dn$ & $\textit{False}$\\
$\textit{bnullable}\,(\textit{ONE}\,bs)$ & $\dn$ & $\textit{True}$\\
@@ -1460,7 +1463,7 @@
\begin{figure}[t]
\begin{center}
\begin{tabular}{@ {\hspace{-8mm}}c@ {}}
- \\[-7mm]
+ \\[-8mm]
@{thm[mode=Axiom] bs1[of _ "r\<^sub>2"]}$S\ZERO{}_l$\quad
@{thm[mode=Axiom] bs2[of _ "r\<^sub>1"]}$S\ZERO{}_r$\quad
@{thm[mode=Axiom] bs3[of "bs\<^sub>1" "bs\<^sub>2"]}$S\ONE$\\
@@ -1569,7 +1572,7 @@
Note that while Antimirov was able to give a bound on the \emph{size}
of his partial derivatives~\cite{Antimirov95}, Brzozowski gave a bound
on the \emph{number} of derivatives, provided they are quotient via
-ACI rules \cite{Brzozowski1964}. Brozozowski's result is crucial when one
+ACI rules \cite{Brzozowski1964}. Brzozowski's result is crucial when one
uses his derivatives for obtaining a DFA (it essentially bounds
the number of states). However, this result does \emph{not}
transfer to our setting where we are interested in the \emph{size} of the
@@ -1594,7 +1597,7 @@
\cite{Sulzmann2014}. This follows earlier work where we established
the correctness of the first algorithm
\cite{AusafDyckhoffUrban2016}. In the earlier work we needed to
- introduce our own specification about what POSIX values are,
+ introduce our own specification for POSIX values,
because the informal definition given by Sulzmann and Lu did not
stand up to formal proof. Also for the second algorithm we needed
to introduce our own definitions and proof ideas in order to
@@ -1603,12 +1606,13 @@
aggressive simplification method there is a chance that the
derivatives can be kept universally small (we established in this
paper that for a given $r$ they can be kept finitely bounded for
- all strings). Our formalisation is approximately 7500 lines of
+ \emph{all} strings). Our formalisation is approximately 7500 lines of
Isabelle code. A little more than half of this code concerns the finiteness bound
obtained in Section 5. This slight ``bloat'' in the latter part is because
we had to introduce an intermediate datatype for annotated regular expressions and repeat many
- definitions for this intermediate datatype. But overall this made our
- formalisation work smoother.
+ definitions for this intermediate datatype. But overall we think this made our
+ formalisation work smoother. The code of our formalisation can be found at
+ \textcolor{darkblue}{\url{https://github.com/urbanchr/posix}}.
%This is important if one is after
%an efficient POSIX lexing algorithm based on derivatives.
@@ -1697,7 +1701,7 @@
@{text a} does \emph{not} match this regular expression. Therefore
it is probably a wise choice that in newer versions of Rust's
regular expression library such regular expressions are declared as
- ``too big''. While this is clearly
+ ``too big'' and raise an error message. While this is clearly
a contrived example, the safety guaranties Rust wants to provide necessitate
this conservative approach.
However, with the derivatives and simplifications we have shown
@@ -1730,11 +1734,11 @@
%Possible ideas are
%zippers which have been used in the context of parsing of
%context-free grammars \cite{zipperparser}.
- \\[-4mm] %\smallskip
+ %\\[-5mm] %\smallskip
- \noindent
- Our Isabelle code including the results from Sec.~5 is available from \url{https://github.com/urbanchr/posix}.
- \\[-10mm]
+ %\noindent
+ %Our Isabelle code including the results from Sec.~5 is available from \url{https://github.com/urbanchr/posix}.
+ %%\\[-10mm]
%%\bibliographystyle{plain}
--- a/thys3/document/root.tex Thu Feb 16 23:23:22 2023 +0000
+++ b/thys3/document/root.tex Mon Feb 20 23:40:30 2023 +0000
@@ -53,6 +53,7 @@
\newcommand{\ZERO}{\mbox{\bf 0}}
\newcommand{\ONE}{\mbox{\bf 1}}
\def\rs{\mathit{rs}}
+\definecolor{darkblue}{rgb}{0,0,0.6}
\def\Brz{Brzozowski}
\def\der{\backslash}