# HG changeset patch # User Chengsong # Date 1562072502 -3600 # Node ID b5363c0dcd99d27a24b89180c0ca7dd5e6e51544 # Parent 17d8e7599a01d0eb4aef17b086c8aeeb3c68bf29 half easy changes diff -r 17d8e7599a01 -r b5363c0dcd99 ninems/ninems.tex --- a/ninems/ninems.tex Tue Jul 02 11:22:39 2019 +0100 +++ b/ninems/ninems.tex Tue Jul 02 14:01:42 2019 +0100 @@ -70,7 +70,8 @@ it matches the string. This is important for applications such as lexing (tokenising a string). The problem is to make the algorithm by Sulzmann and Lu fast on all inputs without breaking its - correctness. + correctness. We have already developed some simplification rules, but have not shown that they + preserve the correctness. We also have not yet looked at extended regular expressions. \end{abstract} \section{Introduction} @@ -241,14 +242,13 @@ \section{The Algorithms by Brzozowski, and Sulzmann and Lu} -Suppose regular expressions are given by the following grammar:\\ - $r$ $::=$ $\ZERO$ $\mid$ $\ONE$ - $\mid$ $c$ - $\mid$ $r_1 \cdot r_2$ - $\mid$ $r_1 + r_2$ - $\mid$ $r^*$ - - +Suppose basic regular expressions are given by the following grammar:\\ +\[ r ::= \ZERO \mid \ONE + \mid c + \mid r_1 \cdot r_2 + \mid r_1 + r_2 + \mid r^* +\] \noindent The intended meaning of the regular expressions is as usual: $\ZERO$ @@ -270,7 +270,7 @@ $\nullable(r^*)$ & $\dn$ & $\mathit{true}$ \\ \end{tabular} \end{center} -The function tests whether the empty string is in $L(r)$. +This function simply tests whether the empty string is in $L(r)$. He then defined the following operation on regular expressions, written $r\backslash c$ (the derivative of $r$ w.r.t.~the character $c$): @@ -291,16 +291,6 @@ \noindent - For us the main advantage is that derivatives can be -straightforwardly implemented in any functional programming language, -and are easily definable and reasoned about in theorem provers---the -definitions just consist of inductive datatypes and simple recursive -functions. Moreover, the notion of derivatives can be easily -generalised to cover extended regular expression constructors such as -the not-regular expression, written $\neg\,r$, or bounded repetitions -(for example $r^{\{n\}}$ and $r^{\{n..m\}}$), which cannot be so -straightforwardly realised within the classic automata approach. - %Assuming the classic notion of a @@ -314,14 +304,26 @@ \end{center} \noindent -So if we want to find out whether a string $s$ + For us the main advantage is that derivatives can be +straightforwardly implemented in any functional programming language, +and are easily definable and reasoned about in theorem provers---the +definitions just consist of inductive datatypes and simple recursive +functions. Moreover, the notion of derivatives can be easily +generalised to cover extended regular expression constructors such as +the not-regular expression, written $\neg\,r$, or bounded repetitions +(for example $r^{\{n\}}$ and $r^{\{n..m\}}$), which cannot be so +straightforwardly realised within the classic automata approach. +For the moment however, we focus only on the usual basic regular expressions. + + +Now if we want to find out whether a string $s$ matches with a regular expression $r$, build the derivatives of $r$ w.r.t.\ (in succession) all the characters of the string $s$. Finally, test whether the resulting regular expression can match the empty string. If yes, then $r$ matches $s$, and no in the negative case. -We can generalise the derivative operation for strings like this: +For this we can generalise the derivative operation for strings like this: \begin{center} \begin{tabular}{lcl} $r \backslash (c\!::\!s) $ & $\dn$ & $(r \backslash c) \backslash s$ \\ @@ -334,6 +336,12 @@ \[ match\;s\;r \;\dn\; nullable(r\backslash s) \] +This algorithm can be illustrated as follows: +\begin{tikzcd}\label{graph:*} +r_0 \arrow[r, "\backslash c_0"] & r_1 \arrow[r, "\backslash c_1"] & r_2 \arrow[r, dashed] & r_n +\end{tikzcd} + + One limitation, however, of Brzozowski's algorithm is that it only produces a YES/NO answer for whether a string is being matched by a regular expression. Sulzmann and Lu~\cite{Sulzmann2014} extended this @@ -402,7 +410,7 @@ The contribution of Sulzmann and Lu is an extension of Brzozowski's algorithm by a second phase (the first phase being building successive derivatives). In this second phase, for every successful match the -corresponding POSIX value is computed. The whole process looks like the following diagram(the working flow of the simple matching algorithm that just gives a $YES/NO$ answer is provided on the right for the purpose of comparison):\\ +corresponding POSIX value is computed. The whole process looks like the following diagram(the working flow of the simple matching algorithm that just gives a $YES/NO$ answer is given before \ref{graph:*}):\\ \begin{tikzcd} r_0 \arrow[r, "\backslash c_0"] \arrow[d] & r_1 \arrow[r, "\backslash c_1"] \arrow[d] & r_2 \arrow[r, dashed] \arrow[d] & r_n \arrow[d, "mkeps" description] \\ v_0 & v_1 \arrow[l,"inj_{r_0} c_0"] & v_2 \arrow[l, "inj_{r_1} c_1"] & v_n \arrow[l, dashed]