--- a/ninems/ninems.tex Tue Jul 02 22:24:27 2019 +0100
+++ b/ninems/ninems.tex Wed Jul 03 09:48:42 2019 +0100
@@ -65,13 +65,16 @@
Brzozowski introduced in 1964 a beautifully simple algorithm for
regular expression matching based on the notion of derivatives of
regular expressions. In 2014, Sulzmann and Lu extended this
- algorithm to not just give a YES/NO answer for whether or not a regular
- expression matches a string, but in case it matches also \emph{how}
- it matches the string. This is important for applications such as
- lexing (tokenising a string). The problem is to make the algorithm
- by Sulzmann and Lu fast on all inputs without breaking its
- correctness. We have already developed some simplification rules, but have not shown that they
- preserve the correctness. We also have not yet looked at extended regular expressions.
+ algorithm to not just give a YES/NO answer for whether or not a
+ regular expression matches a string, but in case it matches also
+ \emph{how} it matches the string. This is important for
+ applications such as lexing (tokenising a string). The problem is to
+ make the algorithm by Sulzmann and Lu fast on all inputs without
+ breaking its correctness. We have already developed some
+ simplification rules for this, but have not proved that they
+ preserve the correctness of the algorithm. We also have not yet
+ looked at extended regular expressions, such as bounded repetitions,
+ negation and back-references.
\end{abstract}
\section{Introduction}
@@ -242,7 +245,7 @@
\section{The Algorithms by Brzozowski, and Sulzmann and Lu}
-Suppose basic regular expressions are given by the following grammar:\\
+Suppose (basic) regular expressions are given by the following grammar:
\[ r ::= \ZERO \mid \ONE
\mid c
\mid r_1 \cdot r_2
@@ -251,15 +254,18 @@
\]
\noindent
-The intended meaning of the regular expressions is as usual: $\ZERO$
+The intended meaning of the constructors is as usual: $\ZERO$
cannot match any string, $\ONE$ can match the empty string, the
character regular expression $c$ can match the character $c$, and so
-on. The brilliant contribution by Brzozowski is the notion of
+on.
+
+The brilliant contribution by Brzozowski is the notion of
\emph{derivatives} of regular expressions. The idea behind this
notion is as follows: suppose a regular expression $r$ can match a
string of the form $c\!::\! s$ (that is a list of characters starting
with $c$), what does the regular expression look like that can match
-just $s$? Brzozowski gave a neat answer to this question. He started with the definition of $nullable$:
+just $s$? Brzozowski gave a neat answer to this question. He started
+with the definition of $nullable$:
\begin{center}
\begin{tabular}{lcl}
$\nullable(\ZERO)$ & $\dn$ & $\mathit{false}$ \\
@@ -295,8 +301,8 @@
%Assuming the classic notion of a
%\emph{language} of a regular expression, written $L(\_)$, t
-The main
-property of the derivative operation is that
+\noindent
+The main property of the derivative operation is that
\begin{center}
$c\!::\!s \in L(r)$ holds
@@ -316,37 +322,46 @@
For the moment however, we focus only on the usual basic regular expressions.
-Now if we want to find out whether a string $s$
-matches with a regular expression $r$, build the derivatives of $r$
-w.r.t.\ (in succession) all the characters of the string $s$. Finally,
-test whether the resulting regular expression can match the empty
-string. If yes, then $r$ matches $s$, and no in the negative
-case.
-
-For this we can generalise the derivative operation for strings like this:
+Now if we want to find out whether a string $s$ matches with a regular
+expression $r$, build the derivatives of $r$ w.r.t.\ (in succession)
+all the characters of the string $s$. Finally, test whether the
+resulting regular expression can match the empty string. If yes, then
+$r$ matches $s$, and no in the negative case. To implement this idea
+we can generalise the derivative operation to strings like this:
\begin{center}
\begin{tabular}{lcl}
$r \backslash (c\!::\!s) $ & $\dn$ & $(r \backslash c) \backslash s$ \\
-$r \backslash \epsilon $ & $\dn$ & $r$
+$r \backslash [\,] $ & $\dn$ & $r$
\end{tabular}
\end{center}
+
\noindent
-Using the above definition we obtain a simple and elegant regular
+Using this definition we obtain a simple and elegant regular
expression matching algorithm:
\[
match\;s\;r \;\dn\; nullable(r\backslash s)
\]
-This algorithm can be illustrated as follows:
+
+\noindent
+Pictorially this algorithm can be illustrated as follows:
+
+\begin{center}
\begin{tikzcd}\label{graph:*}
r_0 \arrow[r, "\backslash c_0"] & r_1 \arrow[r, "\backslash c_1"] & r_2 \arrow[r, dashed] & r_n \arrow[r,"nullable?"] & Yes/No
\end{tikzcd}
-where we bnuild the successive derivative until we exhaust the string.
+\end{center}
+
+\noindent
+where we start with a regular expression $r_0$, build successive derivatives
+until we exhaust the string and then use \textit{nullable} to test whether the
+result can match the empty string. It can be relatively easily shown that this
+matcher is correct.
One limitation, however, of Brzozowski's algorithm is that it only
produces a YES/NO answer for whether a string is being matched by a
regular expression. Sulzmann and Lu~\cite{Sulzmann2014} extended this
algorithm to allow generation of an actual matching, called a
-\emph{value}.
+\emph{value}.
\begin{center}
\begin{tabular}{c@{\hspace{20mm}}c}