lexing: changeset 643:9580bae0500d

--- a/thys3/Paper.thy	Thu Feb 16 23:23:22 2023 +0000
+++ b/thys3/Paper.thy	Mon Feb 20 23:40:30 2023 +0000
@@ -216,7 +216,10 @@
 of POSIX lexing from our earlier work \cite{AusafDyckhoffUrban2016}. This serves
 as a reference point for what correctness means in our Isabelle/HOL proofs. We shall then prove
 the correctness for the bitcoded algorithm without simplification, and
-after that extend the proof to include simplification.\mbox{}\\[-6mm]
+after that extend the proof to include simplification.
+Our Isabelle code including the results from Sec.~5 is available
+from \textcolor{darkblue}{\url{https://github.com/urbanchr/posix}}.
+\mbox{}\\[-6mm]
 
 *}
 
@@ -256,7 +259,7 @@
   In our work here we also add to the usual ``basic'' regular
   expressions the \emph{bounded} regular expression @{term "NTIMES r
   n"} where the @{term n} specifies that @{term r} should match
-  exactly @{term n}-times. Again for brevity we omit the other bounded
+  exactly @{term n}-times (it is not included in Sulzmann and Lu's original work). For brevity we omit the other bounded
   regular expressions @{text "r"}$^{\{..n\}}$, @{text "r"}$^{\{n..\}}$
   and @{text "r"}$^{\{n..m\}}$ which specify intervals for how many
   times @{text r} should match. The results presented in this paper
@@ -278,18 +281,18 @@
   adhoc limits for bounded regular expressions: For example in the
   regular expression matching library in the Go language and also in Google's RE2 library the regular expression
   @{term "NTIMES a 1001"} is not permitted, because no counter can be
-  above 1000; and in the built-in regular expression library in Rust
+  above 1000; and in the regular expression library in Rust
   expressions such as @{text "a\<^bsup>{1000}{100}{5}\<^esup>"} give an error
   message for being too big. Up until recently,\footnote{up until version 1.5.4 of the regex
   library in Rust; see also CVE-2022-24713.} Rust
-  however happily generated an automaton for the regular expression
+  however happily generated automata for regular expressions such as 
   @{text "a\<^bsup>{0}{4294967295}\<^esup>"}. This was due to a bug
   in the algorithm that decides when a regular expression is acceptable
-  or too big according to Rust's classification. We shall come back to
+  or too big according to Rust's classification (it did not account for the fact that @{text "a\<^bsup>{0}\<^esup>"} and similar examples can match the empty string). We shall come back to
   this example later in the paper. 
   These problems can of course be solved in matching
   algorithms where automata go beyond the classic notion and for
-  instance include explicit counters (see for example~\cite{CountingSet2020}).
+  instance include explicit counters (e.g.~\cite{CountingSet2020}).
   The point here is that Brzozowski derivatives and the algorithms by
   Sulzmann and Lu can be straightforwardly extended to deal with
   bounded regular expressions and moreover the resulting code
@@ -874,7 +877,7 @@
   bitcoded regular expressions.
   %
   \begin{center}
-  \begin{tabular}{@ {\hspace{-1mm}}c@ {\hspace{6mm}}c@ {}}
+  \begin{tabular}{@ {\hspace{-1mm}}c@ {\hspace{10mm}}c@ {}}
   \begin{tabular}{@ {}l@ {\hspace{0.5mm}}c@ {\hspace{1mm}}l}
   $\textit{bnullable}\,(\textit{ZERO})$ & $\dn$ & $\textit{False}$\\
   $\textit{bnullable}\,(\textit{ONE}\,bs)$ & $\dn$ & $\textit{True}$\\
@@ -1460,7 +1463,7 @@
    \begin{figure}[t]
   \begin{center}
   \begin{tabular}{@ {\hspace{-8mm}}c@ {}}
-  \\[-7mm]
+  \\[-8mm]
   @{thm[mode=Axiom] bs1[of _ "r\<^sub>2"]}$S\ZERO{}_l$\quad
   @{thm[mode=Axiom] bs2[of _ "r\<^sub>1"]}$S\ZERO{}_r$\quad
   @{thm[mode=Axiom] bs3[of "bs\<^sub>1" "bs\<^sub>2"]}$S\ONE$\\
@@ -1569,7 +1572,7 @@
 Note that while Antimirov was able to give a bound on the \emph{size}
 of his partial derivatives~\cite{Antimirov95}, Brzozowski gave a bound
 on the \emph{number} of derivatives, provided they are quotient via
-ACI rules \cite{Brzozowski1964}. Brozozowski's result is crucial when one
+ACI rules \cite{Brzozowski1964}. Brzozowski's result is crucial when one
 uses his derivatives for obtaining a DFA (it essentially bounds
 the number of states). However, this result does \emph{not}
 transfer to our setting where we are interested in the \emph{size} of the
@@ -1594,7 +1597,7 @@
    \cite{Sulzmann2014}. This follows earlier work where we established
    the correctness of the first algorithm
    \cite{AusafDyckhoffUrban2016}. In the earlier work we needed to
-   introduce our own specification about what POSIX values are,
+   introduce our own specification for POSIX values,
    because the informal definition given by Sulzmann and Lu did not
    stand up to formal proof. Also for the second algorithm we needed
    to introduce our own definitions and proof ideas in order to
@@ -1603,12 +1606,13 @@
    aggressive simplification method there is a chance that the
    derivatives can be kept universally small (we established in this
    paper that for a given $r$ they can be kept finitely bounded for
-   all strings).  Our formalisation is approximately 7500 lines of
+   \emph{all} strings).  Our formalisation is approximately 7500 lines of
    Isabelle code. A little more than half of this code concerns the finiteness bound
    obtained in Section 5. This slight ``bloat'' in the latter part is because
    we had to introduce an intermediate datatype for annotated regular expressions and repeat many
-   definitions for this intermediate datatype. But overall this made our
-   formalisation work smoother.
+   definitions for this intermediate datatype. But overall we think this made our
+   formalisation work smoother. The code of our formalisation can be found at
+   \textcolor{darkblue}{\url{https://github.com/urbanchr/posix}}.
    %This is important if one is after
    %an efficient POSIX lexing algorithm based on derivatives.
 
@@ -1697,7 +1701,7 @@
    @{text a} does \emph{not} match this regular expression.  Therefore
    it is probably a wise choice that in newer versions of Rust's
    regular expression library such regular expressions are declared as
-   ``too big''. While this is clearly
+   ``too big'' and raise an error message. While this is clearly
    a contrived example, the safety guaranties Rust wants to provide necessitate
    this conservative approach.
    However, with the derivatives and simplifications we have shown
@@ -1730,11 +1734,11 @@
    %Possible ideas are
    %zippers which have been used in the context of parsing of
    %context-free grammars \cite{zipperparser}.
-   \\[-4mm]  %\smallskip
+   %\\[-5mm]  %\smallskip
    
-   \noindent
-   Our Isabelle code including the results from Sec.~5 is available from \url{https://github.com/urbanchr/posix}.
-   \\[-10mm]
+   %\noindent
+   %Our Isabelle code including the results from Sec.~5 is available from \url{https://github.com/urbanchr/posix}.
+   %%\\[-10mm]
 
 
 %%\bibliographystyle{plain}

--- a/thys3/document/root.tex	Thu Feb 16 23:23:22 2023 +0000
+++ b/thys3/document/root.tex	Mon Feb 20 23:40:30 2023 +0000
@@ -53,6 +53,7 @@
 \newcommand{\ZERO}{\mbox{\bf 0}}
 \newcommand{\ONE}{\mbox{\bf 1}}
 \def\rs{\mathit{rs}}
+\definecolor{darkblue}{rgb}{0,0,0.6}
 
 \def\Brz{Brzozowski}
 \def\der{\backslash}

author	Christian Urban <christian.urban@kcl.ac.uk>
	Mon, 20 Feb 2023 23:40:30 +0000
changeset 643	9580bae0500d
parent 642	6c13f76c070b
child 644	9f984ff20020

thys3/Paper.thy		file \| annotate \| diff \| comparison \| revisions
thys3/document/root.tex		file \| annotate \| diff \| comparison \| revisions