ninems/ninems.tex
changeset 38 b5363c0dcd99
parent 37 17d8e7599a01
child 39 7d18745dd7c9
--- a/ninems/ninems.tex	Tue Jul 02 11:22:39 2019 +0100
+++ b/ninems/ninems.tex	Tue Jul 02 14:01:42 2019 +0100
@@ -70,7 +70,8 @@
   it matches the string.  This is important for applications such as
   lexing (tokenising a string). The problem is to make the algorithm
   by Sulzmann and Lu fast on all inputs without breaking its
-  correctness.
+  correctness. We have already developed some simplification rules, but have not shown that they 
+  preserve the correctness. We also have not yet looked at extended regular expressions.
 \end{abstract}
 
 \section{Introduction}
@@ -241,14 +242,13 @@
 
 \section{The Algorithms by  Brzozowski, and Sulzmann and Lu}
 
-Suppose regular expressions are given by the following grammar:\\
-			$r$  $::=$   $\ZERO$ $\mid$  $\ONE$   
-			 $\mid$  $c$          
-			 $\mid$  $r_1 \cdot r_2$
-			 $\mid$  $r_1 + r_2$   
-			 $\mid$  $r^*$         
-
-
+Suppose basic regular expressions are given by the following grammar:\\
+\[			r ::=   \ZERO \mid  \ONE
+			 \mid  c  
+			 \mid  r_1 \cdot r_2
+			 \mid  r_1 + r_2   
+			 \mid r^*         
+\]
 
 \noindent
 The intended meaning of the regular expressions is as usual: $\ZERO$
@@ -270,7 +270,7 @@
 			$\nullable(r^*)$       & $\dn$ & $\mathit{true}$ \\
 		\end{tabular}
 	\end{center}
-The function tests whether the empty string is in $L(r)$.
+This function simply tests whether the empty string is in $L(r)$.
 He then defined
 the following operation on regular expressions, written
 $r\backslash c$ (the derivative of $r$ w.r.t.~the character $c$):
@@ -291,16 +291,6 @@
 
 \noindent
 
- For us the main advantage is that derivatives can be
-straightforwardly implemented in any functional programming language,
-and are easily definable and reasoned about in theorem provers---the
-definitions just consist of inductive datatypes and simple recursive
-functions. Moreover, the notion of derivatives can be easily
-generalised to cover extended regular expression constructors such as
-the not-regular expression, written $\neg\,r$, or bounded repetitions
-(for example $r^{\{n\}}$ and $r^{\{n..m\}}$), which cannot be so
-straightforwardly realised within the classic automata approach.
-
 
 
  %Assuming the classic notion of a
@@ -314,14 +304,26 @@
 \end{center}
 
 \noindent
-So if we want to find out whether a string $s$
+ For us the main advantage is that derivatives can be
+straightforwardly implemented in any functional programming language,
+and are easily definable and reasoned about in theorem provers---the
+definitions just consist of inductive datatypes and simple recursive
+functions. Moreover, the notion of derivatives can be easily
+generalised to cover extended regular expression constructors such as
+the not-regular expression, written $\neg\,r$, or bounded repetitions
+(for example $r^{\{n\}}$ and $r^{\{n..m\}}$), which cannot be so
+straightforwardly realised within the classic automata approach.
+For the moment however, we focus only on the usual basic regular expressions.
+
+
+Now if we want to find out whether a string $s$
 matches with a regular expression $r$, build the derivatives of $r$
 w.r.t.\ (in succession) all the characters of the string $s$. Finally,
 test whether the resulting regular expression can match the empty
 string.  If yes, then $r$ matches $s$, and no in the negative
 case.
 
-We can generalise the derivative operation for strings like this:
+For this we can generalise the derivative operation for strings like this:
 \begin{center}
 \begin{tabular}{lcl}
 $r \backslash (c\!::\!s) $ & $\dn$ & $(r \backslash c) \backslash s$ \\
@@ -334,6 +336,12 @@
 \[
 match\;s\;r \;\dn\; nullable(r\backslash s)
 \]
+This algorithm can be illustrated as follows:
+\begin{tikzcd}\label{graph:*} 
+r_0 \arrow[r, "\backslash c_0"]  & r_1 \arrow[r, "\backslash c_1"] & r_2 \arrow[r, dashed]  & r_n  
+\end{tikzcd}
+
+
 One limitation, however, of Brzozowski's algorithm is that it only
 produces a YES/NO answer for whether a string is being matched by a
 regular expression.  Sulzmann and Lu~\cite{Sulzmann2014} extended this
@@ -402,7 +410,7 @@
 The contribution of Sulzmann and Lu is an extension of Brzozowski's
 algorithm by a second phase (the first phase being building successive
 derivatives). In this second phase, for every successful match the
-corresponding POSIX value is computed. The whole process looks like the following diagram(the working flow of the simple matching algorithm that just gives a $YES/NO$ answer is provided on the right for the purpose of comparison):\\
+corresponding POSIX value is computed. The whole process looks like the following diagram(the working flow of the simple matching algorithm that just gives a $YES/NO$ answer is given before \ref{graph:*}):\\
 \begin{tikzcd}
 r_0 \arrow[r, "\backslash c_0"]  \arrow[d] & r_1 \arrow[r, "\backslash c_1"] \arrow[d] & r_2 \arrow[r, dashed] \arrow[d] & r_n \arrow[d, "mkeps" description] \\
 v_0           & v_1 \arrow[l,"inj_{r_0} c_0"]                & v_2 \arrow[l, "inj_{r_1} c_1"]              & v_n \arrow[l, dashed]