cst_tests: changeset 40:8063792920ef

--- a/ninems/ninems.tex	Tue Jul 02 22:24:27 2019 +0100
+++ b/ninems/ninems.tex	Wed Jul 03 09:48:42 2019 +0100
@@ -65,13 +65,16 @@
   Brzozowski introduced in 1964 a beautifully simple algorithm for
   regular expression matching based on the notion of derivatives of
   regular expressions. In 2014, Sulzmann and Lu extended this
-  algorithm to not just give a YES/NO answer for whether or not a regular
-  expression matches a string, but in case it matches also \emph{how}
-  it matches the string.  This is important for applications such as
-  lexing (tokenising a string). The problem is to make the algorithm
-  by Sulzmann and Lu fast on all inputs without breaking its
-  correctness. We have already developed some simplification rules, but have not shown that they 
-  preserve the correctness. We also have not yet looked at extended regular expressions.
+  algorithm to not just give a YES/NO answer for whether or not a
+  regular expression matches a string, but in case it matches also
+  \emph{how} it matches the string.  This is important for
+  applications such as lexing (tokenising a string). The problem is to
+  make the algorithm by Sulzmann and Lu fast on all inputs without
+  breaking its correctness. We have already developed some
+  simplification rules for this, but have not proved that they
+  preserve the correctness of the algorithm. We also have not yet
+  looked at extended regular expressions, such as bounded repetitions,
+  negation and back-references.
 \end{abstract}
 
 \section{Introduction}
@@ -242,7 +245,7 @@
 
 \section{The Algorithms by  Brzozowski, and Sulzmann and Lu}
 
-Suppose basic regular expressions are given by the following grammar:\\
+Suppose (basic) regular expressions are given by the following grammar:
 \[			r ::=   \ZERO \mid  \ONE
 			 \mid  c  
 			 \mid  r_1 \cdot r_2
@@ -251,15 +254,18 @@
 \]
 
 \noindent
-The intended meaning of the regular expressions is as usual: $\ZERO$
+The intended meaning of the constructors is as usual: $\ZERO$
 cannot match any string, $\ONE$ can match the empty string, the
 character regular expression $c$ can match the character $c$, and so
-on. The brilliant contribution by Brzozowski is the notion of
+on.
+
+The brilliant contribution by Brzozowski is the notion of
 \emph{derivatives} of regular expressions.  The idea behind this
 notion is as follows: suppose a regular expression $r$ can match a
 string of the form $c\!::\! s$ (that is a list of characters starting
 with $c$), what does the regular expression look like that can match
-just $s$? Brzozowski gave a neat answer to this question. He started with the definition of $nullable$:
+just $s$? Brzozowski gave a neat answer to this question. He started
+with the definition of $nullable$:
 \begin{center}
 		\begin{tabular}{lcl}
 			$\nullable(\ZERO)$     & $\dn$ & $\mathit{false}$ \\  
@@ -295,8 +301,8 @@
 
  %Assuming the classic notion of a
 %\emph{language} of a regular expression, written $L(\_)$, t
-The main
-property of the derivative operation is that
+\noindent
+The main property of the derivative operation is that
 
 \begin{center}
 $c\!::\!s \in L(r)$ holds
@@ -316,37 +322,46 @@
 For the moment however, we focus only on the usual basic regular expressions.
 
 
-Now if we want to find out whether a string $s$
-matches with a regular expression $r$, build the derivatives of $r$
-w.r.t.\ (in succession) all the characters of the string $s$. Finally,
-test whether the resulting regular expression can match the empty
-string.  If yes, then $r$ matches $s$, and no in the negative
-case.
-
-For this we can generalise the derivative operation for strings like this:
+Now if we want to find out whether a string $s$ matches with a regular
+expression $r$, build the derivatives of $r$ w.r.t.\ (in succession)
+all the characters of the string $s$. Finally, test whether the
+resulting regular expression can match the empty string.  If yes, then
+$r$ matches $s$, and no in the negative case. To implement this idea
+we can generalise the derivative operation to strings like this:
 \begin{center}
 \begin{tabular}{lcl}
 $r \backslash (c\!::\!s) $ & $\dn$ & $(r \backslash c) \backslash s$ \\
-$r \backslash \epsilon $ & $\dn$ & $r$
+$r \backslash [\,] $ & $\dn$ & $r$
 \end{tabular}
 \end{center}
+
 \noindent
-Using the above definition we obtain a simple and elegant regular
+Using this definition we obtain a simple and elegant regular
 expression matching algorithm: 
 \[
 match\;s\;r \;\dn\; nullable(r\backslash s)
 \]
-This algorithm can be illustrated as follows:
+
+\noindent
+Pictorially this algorithm can be illustrated as follows:
+
+\begin{center}
 \begin{tikzcd}\label{graph:*} 
 r_0 \arrow[r, "\backslash c_0"]  & r_1 \arrow[r, "\backslash c_1"] & r_2 \arrow[r, dashed]  & r_n  \arrow[r,"nullable?"] & Yes/No
 \end{tikzcd}
-where we bnuild the successive derivative until we exhaust the string.
+\end{center}
+
+\noindent
+where we start with  a regular expression  $r_0$, build successive derivatives
+until we exhaust the string and then use \textit{nullable} to test whether the
+result can match the empty string. It can  be relatively  easily shown that this
+matcher is correct.
 
 One limitation, however, of Brzozowski's algorithm is that it only
 produces a YES/NO answer for whether a string is being matched by a
 regular expression.  Sulzmann and Lu~\cite{Sulzmann2014} extended this
 algorithm to allow generation of an actual matching, called a
-\emph{value}.  
+\emph{value}.
 
 \begin{center}
 	\begin{tabular}{c@{\hspace{20mm}}c}
author	Christian Urban <urbanc@in.tum.de>
	Wed, 03 Jul 2019 09:48:42 +0100
changeset 40	8063792920ef
parent 39	7d18745dd7c9
child 41	a1f90febbc7f