cst_tests: comparison ninems/ninems.tex

equal deleted inserted replaced

-:7d18745dd7c9
+:8063792920ef
 \begin{abstract}
 Brzozowski introduced in 1964 a beautifully simple algorithm for
 regular expression matching based on the notion of derivatives of
 regular expressions. In 2014, Sulzmann and Lu extended this
-algorithm to not just give a YES/NO answer for whether or not a regular
+algorithm to not just give a YES/NO answer for whether or not a
-expression matches a string, but in case it matches also \emph{how}
+regular expression matches a string, but in case it matches also
-it matches the string.  This is important for applications such as
+\emph{how} it matches the string.  This is important for
-lexing (tokenising a string). The problem is to make the algorithm
+applications such as lexing (tokenising a string). The problem is to
-by Sulzmann and Lu fast on all inputs without breaking its
+make the algorithm by Sulzmann and Lu fast on all inputs without
-correctness. We have already developed some simplification rules, but have not shown that they
+breaking its correctness. We have already developed some
-preserve the correctness. We also have not yet looked at extended regular expressions.
+simplification rules for this, but have not proved that they
+preserve the correctness of the algorithm. We also have not yet
+looked at extended regular expressions, such as bounded repetitions,
+negation and back-references.
 \end{abstract}
 \section{Introduction}
 This PhD-project is about regular expression matching and
 regular expressions \cite{Brzozowski1964}. We shall briefly explain
 the algorithms next.
 \section{The Algorithms by  Brzozowski, and Sulzmann and Lu}
-Suppose basic regular expressions are given by the following grammar:\\
+Suppose (basic) regular expressions are given by the following grammar:
 \[			r ::=   \ZERO \mid  \ONE
 			 \mid  c
 			 \mid  r_1 \cdot r_2
 			 \mid  r_1 + r_2
 			 \mid r^*
 \]
 \noindent
-The intended meaning of the regular expressions is as usual: $\ZERO$
+The intended meaning of the constructors is as usual: $\ZERO$
 cannot match any string, $\ONE$ can match the empty string, the
 character regular expression $c$ can match the character $c$, and so
-on. The brilliant contribution by Brzozowski is the notion of
+on.
+The brilliant contribution by Brzozowski is the notion of
 \emph{derivatives} of regular expressions.  The idea behind this
 notion is as follows: suppose a regular expression $r$ can match a
 string of the form $c\!::\! s$ (that is a list of characters starting
 with $c$), what does the regular expression look like that can match
-just $s$? Brzozowski gave a neat answer to this question. He started with the definition of $nullable$:
+just $s$? Brzozowski gave a neat answer to this question. He started
+with the definition of $nullable$:
 \begin{center}
 		\begin{tabular}{lcl}
 			$\nullable(\ZERO)$     & $\dn$ & $\mathit{false}$ \\
 			$\nullable(\ONE)$      & $\dn$ & $\mathit{true}$ \\
 			$\nullable(c)$ 	       & $\dn$ & $\mathit{false}$ \\
 %Assuming the classic notion of a
 %\emph{language} of a regular expression, written $L(\_)$, t
-The main
+\noindent
-property of the derivative operation is that
+The main property of the derivative operation is that
 \begin{center}
 $c\!::\!s \in L(r)$ holds
 if and only if $s \in L(r\backslash c)$.
 \end{center}
 (for example $r^{\{n\}}$ and $r^{\{n..m\}}$), which cannot be so
 straightforwardly realised within the classic automata approach.
 For the moment however, we focus only on the usual basic regular expressions.
-Now if we want to find out whether a string $s$
+Now if we want to find out whether a string $s$ matches with a regular
-matches with a regular expression $r$, build the derivatives of $r$
+expression $r$, build the derivatives of $r$ w.r.t.\ (in succession)
-w.r.t.\ (in succession) all the characters of the string $s$. Finally,
+all the characters of the string $s$. Finally, test whether the
-test whether the resulting regular expression can match the empty
+resulting regular expression can match the empty string.  If yes, then
-string.  If yes, then $r$ matches $s$, and no in the negative
+$r$ matches $s$, and no in the negative case. To implement this idea
-case.
+we can generalise the derivative operation to strings like this:
-For this we can generalise the derivative operation for strings like this:
 \begin{center}
 \begin{tabular}{lcl}
 $r \backslash (c\!::\!s) $ & $\dn$ & $(r \backslash c) \backslash s$ \\
-$r \backslash \epsilon $ & $\dn$ & $r$
+$r \backslash [\,] $ & $\dn$ & $r$
 \end{tabular}
 \end{center}
-\noindent
-Using the above definition we obtain a simple and elegant regular
+\noindent
+Using this definition we obtain a simple and elegant regular
 expression matching algorithm:
 \[
 match\;s\;r \;\dn\; nullable(r\backslash s)
 \]
-This algorithm can be illustrated as follows:
+\noindent
+Pictorially this algorithm can be illustrated as follows:
+\begin{center}
 \begin{tikzcd}\label{graph:*}
 r_0 \arrow[r, "\backslash c_0"]  & r_1 \arrow[r, "\backslash c_1"] & r_2 \arrow[r, dashed]  & r_n  \arrow[r,"nullable?"] & Yes/No
 \end{tikzcd}
-where we bnuild the successive derivative until we exhaust the string.
+\end{center}
+\noindent
+where we start with  a regular expression  $r_0$, build successive derivatives
+until we exhaust the string and then use \textit{nullable} to test whether the
+result can match the empty string. It can  be relatively  easily shown that this
+matcher is correct.
 One limitation, however, of Brzozowski's algorithm is that it only
 produces a YES/NO answer for whether a string is being matched by a
 regular expression.  Sulzmann and Lu~\cite{Sulzmann2014} extended this
 algorithm to allow generation of an actual matching, called a
 \emph{value}.
 \begin{center}
 	\begin{tabular}{c@{\hspace{20mm}}c}
 		\begin{tabular}{@{}rrl@{}}
 			\multicolumn{3}{@{}l}{\textbf{Regular Expressions}}\medskip\\

changeset 40	8063792920ef
parent 39	7d18745dd7c9
child 41	a1f90febbc7f