cst_tests: comparison ninems/ninems.tex

equal deleted inserted replaced

-:17d8e7599a01
+:b5363c0dcd99
 algorithm to not just give a YES/NO answer for whether or not a regular
 expression matches a string, but in case it matches also \emph{how}
 it matches the string.  This is important for applications such as
 lexing (tokenising a string). The problem is to make the algorithm
 by Sulzmann and Lu fast on all inputs without breaking its
-correctness.
+correctness. We have already developed some simplification rules, but have not shown that they
+preserve the correctness. We also have not yet looked at extended regular expressions.
 \end{abstract}
 \section{Introduction}
 This PhD-project is about regular expression matching and
 regular expressions \cite{Brzozowski1964}. We shall briefly explain
 the algorithms next.
 \section{The Algorithms by  Brzozowski, and Sulzmann and Lu}
-Suppose regular expressions are given by the following grammar:\\
+Suppose basic regular expressions are given by the following grammar:\\
-			$r$  $::=$   $\ZERO$ $\mid$  $\ONE$
+\[			r ::=   \ZERO \mid  \ONE
-			 $\mid$  $c$
+			 \mid  c
-			 $\mid$  $r_1 \cdot r_2$
+			 \mid  r_1 \cdot r_2
-			 $\mid$  $r_1 + r_2$
+			 \mid  r_1 + r_2
-			 $\mid$  $r^*$
+			 \mid r^*
+\]
 \noindent
 The intended meaning of the regular expressions is as usual: $\ZERO$
 cannot match any string, $\ONE$ can match the empty string, the
 character regular expression $c$ can match the character $c$, and so
 			$\nullable(r_1 + r_2)$ & $\dn$ & $\nullable(r_1) \vee \nullable(r_2)$ \\
 			$\nullable(r_1\cdot r_2)$  & $\dn$ & $\nullable(r_1) \wedge \nullable(r_2)$ \\
 			$\nullable(r^*)$       & $\dn$ & $\mathit{true}$ \\
 		\end{tabular}
 	\end{center}
-The function tests whether the empty string is in $L(r)$.
+This function simply tests whether the empty string is in $L(r)$.
 He then defined
 the following operation on regular expressions, written
 $r\backslash c$ (the derivative of $r$ w.r.t.~the character $c$):
 \begin{center}
 \end{tabular}
 \end{center}
 \noindent
+%Assuming the classic notion of a
+%\emph{language} of a regular expression, written $L(\_)$, t
+The main
+property of the derivative operation is that
+\begin{center}
+$c\!::\!s \in L(r)$ holds
+if and only if $s \in L(r\backslash c)$.
+\end{center}
+\noindent
 For us the main advantage is that derivatives can be
 straightforwardly implemented in any functional programming language,
 and are easily definable and reasoned about in theorem provers---the
 definitions just consist of inductive datatypes and simple recursive
 functions. Moreover, the notion of derivatives can be easily
 generalised to cover extended regular expression constructors such as
 the not-regular expression, written $\neg\,r$, or bounded repetitions
 (for example $r^{\{n\}}$ and $r^{\{n..m\}}$), which cannot be so
 straightforwardly realised within the classic automata approach.
+For the moment however, we focus only on the usual basic regular expressions.
-%Assuming the classic notion of a
+Now if we want to find out whether a string $s$
-%\emph{language} of a regular expression, written $L(\_)$, t
-The main
-property of the derivative operation is that
-\begin{center}
-$c\!::\!s \in L(r)$ holds
-if and only if $s \in L(r\backslash c)$.
-\end{center}
-\noindent
-So if we want to find out whether a string $s$
 matches with a regular expression $r$, build the derivatives of $r$
 w.r.t.\ (in succession) all the characters of the string $s$. Finally,
 test whether the resulting regular expression can match the empty
 string.  If yes, then $r$ matches $s$, and no in the negative
 case.
-We can generalise the derivative operation for strings like this:
+For this we can generalise the derivative operation for strings like this:
 \begin{center}
 \begin{tabular}{lcl}
 $r \backslash (c\!::\!s) $ & $\dn$ & $(r \backslash c) \backslash s$ \\
 $r \backslash \epsilon $ & $\dn$ & $r$
 \end{tabular}
 Using the above definition we obtain a simple and elegant regular
 expression matching algorithm:
 \[
 match\;s\;r \;\dn\; nullable(r\backslash s)
 \]
+This algorithm can be illustrated as follows:
+\begin{tikzcd}\label{graph:*}
+r_0 \arrow[r, "\backslash c_0"]  & r_1 \arrow[r, "\backslash c_1"] & r_2 \arrow[r, dashed]  & r_n
+\end{tikzcd}
 One limitation, however, of Brzozowski's algorithm is that it only
 produces a YES/NO answer for whether a string is being matched by a
 regular expression.  Sulzmann and Lu~\cite{Sulzmann2014} extended this
 algorithm to allow generation of an actual matching, called a
 \emph{value}.
 expression.
 The contribution of Sulzmann and Lu is an extension of Brzozowski's
 algorithm by a second phase (the first phase being building successive
 derivatives). In this second phase, for every successful match the
-corresponding POSIX value is computed. The whole process looks like the following diagram(the working flow of the simple matching algorithm that just gives a $YES/NO$ answer is provided on the right for the purpose of comparison):\\
+corresponding POSIX value is computed. The whole process looks like the following diagram(the working flow of the simple matching algorithm that just gives a $YES/NO$ answer is given before \ref{graph:*}):\\
 \begin{tikzcd}
 r_0 \arrow[r, "\backslash c_0"]  \arrow[d] & r_1 \arrow[r, "\backslash c_1"] \arrow[d] & r_2 \arrow[r, dashed] \arrow[d] & r_n \arrow[d, "mkeps" description] \\
 v_0           & v_1 \arrow[l,"inj_{r_0} c_0"]                & v_2 \arrow[l, "inj_{r_1} c_1"]              & v_n \arrow[l, dashed]
 \end{tikzcd}
 \begin{tikzcd}

changeset 38	b5363c0dcd99
parent 37	17d8e7599a01
child 39	7d18745dd7c9