--- a/ninems/ninems.tex Tue Jul 02 11:22:39 2019 +0100
+++ b/ninems/ninems.tex Tue Jul 02 14:01:42 2019 +0100
@@ -70,7 +70,8 @@
it matches the string. This is important for applications such as
lexing (tokenising a string). The problem is to make the algorithm
by Sulzmann and Lu fast on all inputs without breaking its
- correctness.
+ correctness. We have already developed some simplification rules, but have not shown that they
+ preserve the correctness. We also have not yet looked at extended regular expressions.
\end{abstract}
\section{Introduction}
@@ -241,14 +242,13 @@
\section{The Algorithms by Brzozowski, and Sulzmann and Lu}
-Suppose regular expressions are given by the following grammar:\\
- $r$ $::=$ $\ZERO$ $\mid$ $\ONE$
- $\mid$ $c$
- $\mid$ $r_1 \cdot r_2$
- $\mid$ $r_1 + r_2$
- $\mid$ $r^*$
-
-
+Suppose basic regular expressions are given by the following grammar:\\
+\[ r ::= \ZERO \mid \ONE
+ \mid c
+ \mid r_1 \cdot r_2
+ \mid r_1 + r_2
+ \mid r^*
+\]
\noindent
The intended meaning of the regular expressions is as usual: $\ZERO$
@@ -270,7 +270,7 @@
$\nullable(r^*)$ & $\dn$ & $\mathit{true}$ \\
\end{tabular}
\end{center}
-The function tests whether the empty string is in $L(r)$.
+This function simply tests whether the empty string is in $L(r)$.
He then defined
the following operation on regular expressions, written
$r\backslash c$ (the derivative of $r$ w.r.t.~the character $c$):
@@ -291,16 +291,6 @@
\noindent
- For us the main advantage is that derivatives can be
-straightforwardly implemented in any functional programming language,
-and are easily definable and reasoned about in theorem provers---the
-definitions just consist of inductive datatypes and simple recursive
-functions. Moreover, the notion of derivatives can be easily
-generalised to cover extended regular expression constructors such as
-the not-regular expression, written $\neg\,r$, or bounded repetitions
-(for example $r^{\{n\}}$ and $r^{\{n..m\}}$), which cannot be so
-straightforwardly realised within the classic automata approach.
-
%Assuming the classic notion of a
@@ -314,14 +304,26 @@
\end{center}
\noindent
-So if we want to find out whether a string $s$
+ For us the main advantage is that derivatives can be
+straightforwardly implemented in any functional programming language,
+and are easily definable and reasoned about in theorem provers---the
+definitions just consist of inductive datatypes and simple recursive
+functions. Moreover, the notion of derivatives can be easily
+generalised to cover extended regular expression constructors such as
+the not-regular expression, written $\neg\,r$, or bounded repetitions
+(for example $r^{\{n\}}$ and $r^{\{n..m\}}$), which cannot be so
+straightforwardly realised within the classic automata approach.
+For the moment however, we focus only on the usual basic regular expressions.
+
+
+Now if we want to find out whether a string $s$
matches with a regular expression $r$, build the derivatives of $r$
w.r.t.\ (in succession) all the characters of the string $s$. Finally,
test whether the resulting regular expression can match the empty
string. If yes, then $r$ matches $s$, and no in the negative
case.
-We can generalise the derivative operation for strings like this:
+For this we can generalise the derivative operation for strings like this:
\begin{center}
\begin{tabular}{lcl}
$r \backslash (c\!::\!s) $ & $\dn$ & $(r \backslash c) \backslash s$ \\
@@ -334,6 +336,12 @@
\[
match\;s\;r \;\dn\; nullable(r\backslash s)
\]
+This algorithm can be illustrated as follows:
+\begin{tikzcd}\label{graph:*}
+r_0 \arrow[r, "\backslash c_0"] & r_1 \arrow[r, "\backslash c_1"] & r_2 \arrow[r, dashed] & r_n
+\end{tikzcd}
+
+
One limitation, however, of Brzozowski's algorithm is that it only
produces a YES/NO answer for whether a string is being matched by a
regular expression. Sulzmann and Lu~\cite{Sulzmann2014} extended this
@@ -402,7 +410,7 @@
The contribution of Sulzmann and Lu is an extension of Brzozowski's
algorithm by a second phase (the first phase being building successive
derivatives). In this second phase, for every successful match the
-corresponding POSIX value is computed. The whole process looks like the following diagram(the working flow of the simple matching algorithm that just gives a $YES/NO$ answer is provided on the right for the purpose of comparison):\\
+corresponding POSIX value is computed. The whole process looks like the following diagram(the working flow of the simple matching algorithm that just gives a $YES/NO$ answer is given before \ref{graph:*}):\\
\begin{tikzcd}
r_0 \arrow[r, "\backslash c_0"] \arrow[d] & r_1 \arrow[r, "\backslash c_1"] \arrow[d] & r_2 \arrow[r, dashed] \arrow[d] & r_n \arrow[d, "mkeps" description] \\
v_0 & v_1 \arrow[l,"inj_{r_0} c_0"] & v_2 \arrow[l, "inj_{r_1} c_1"] & v_n \arrow[l, dashed]