cst_tests: comparison etnms/etnms.tex

equal deleted inserted replaced

-:b1e365afa29c
+:0a0c551bb368
 at extended regular expressions, such as bounded repetitions,
 negation and back-references.
 \end{abstract}
 \section{Recapitulation of Concepts From the Last Report}
-\subsection{The Algorithm by Brzozowski based on Derivatives of Regular
-Expressions}
+\subsection*{Regular Expressions and Derivatives}
 Suppose (basic) regular expressions are given by the following grammar:
 \[			r ::=   \ZERO \mid  \ONE
 			 \mid  c
 			 \mid  r_1 \cdot r_2
 			 \mid  r_1 + r_2
 			 \mid r^*
 \]
 \noindent
+The ingenious contribution of Brzozowski is the notion of \emph{derivatives} of
-The ingenious contribution by Brzozowski is the notion of
+regular expressions, written~$\_ \backslash \_$. It uses the auxiliary notion of
-\emph{derivatives} of regular expressions.
+$\nullable$ defined below.
 \begin{center}
 		\begin{tabular}{lcl}
 			$\nullable(\ZERO)$     & $\dn$ & $\mathit{false}$ \\
 			$\nullable(\ONE)$      & $\dn$ & $\mathit{true}$ \\
 			$\nullable(c)$ 	       & $\dn$ & $\mathit{false}$ \\
 $c\!::\!s \in L(r)$ holds
 if and only if $s \in L(r\backslash c)$.
 \end{center}
 \noindent
+We can generalise the derivative operation shown above for single characters
+to strings as follows:
-Now we can generalise the derivative operation to strings like this:
 \begin{center}
 \begin{tabular}{lcl}
 $r \backslash (c\!::\!s) $ & $\dn$ & $(r \backslash c) \backslash s$ \\
 $r \backslash [\,] $ & $\dn$ & $r$
 \end{tabular}
 \end{center}
 \noindent
-and then define as  regular-expression matching algorithm:
+and then define Brzozowski's  regular-expression matching algorithm as:
 \[
 match\;s\;r \;\dn\; nullable(r\backslash s)
 \]
 \noindent
-This algorithm looks graphically as follows:
+Assuming the a string is givane as a sequence of characters, say $c_0c_1..c_n$,
+this algorithm presented graphically is as follows:
 \begin{equation}\label{graph:*}
 \begin{tikzcd}
 r_0 \arrow[r, "\backslash c_0"]  & r_1 \arrow[r, "\backslash c_1"] & r_2 \arrow[r, dashed]  & r_n  \arrow[r,"\textit{nullable}?"] & \;\textrm{YES}/\textrm{NO}
 \end{tikzcd}
 \end{equation}
 to test whether the result can match the empty string. It can  be
 relatively  easily shown that this matcher is correct  (that is given
 an $s = c_0...c_{n-1}$ and an $r_0$, it generates YES if and only if $s \in L(r_0)$).
-\subsection{Values and the Algorithm by Sulzmann and Lu}
+\subsection*{Values and the Lexing Algorithm by Sulzmann and Lu}
 One limitation of Brzozowski's algorithm is that it only produces a
 YES/NO answer for whether a string is being matched by a regular
 expression.  Sulzmann and Lu~\cite{Sulzmann2014} extended this algorithm
 to allow generation of an actual matching, called a \emph{value} or
 		\end{tabular}
 	\end{tabular}
 \end{center}
 \noindent
 The contribution of Sulzmann and Lu is an extension of Brzozowski's
 algorithm by a second phase (the first phase being building successive
 derivatives---see \eqref{graph:*}). In this second phase, a POSIX value
 is generated in case the regular expression matches  the string.
 Pictorially, the Sulzmann and Lu algorithm is as follows:
 \noindent This definition is by recursion on the ``shape'' of regular
 expressions and values.
-\subsection{Simplification of Regular Expressions}
+\subsection*{Simplification of Regular Expressions}
 The main drawback of building successive derivatives according
 to Brzozowski's definition is that they can grow very quickly in size.
 This is mainly due to the fact that the derivative operation generates
 often ``useless'' $\ZERO$s and $\ONE$s in derivatives.  As a result, if
 If we want the size of derivatives in Sulzmann and Lu's algorithm to
 stay below this bound, we would need more aggressive simplifications.
 Essentially we need to delete useless $\ZERO$s and $\ONE$s, as well as
 deleting duplicates whenever possible. For example, the parentheses in
-$(a+b) \cdot c + bc$ can be opened up to get $a\cdot c +  b \cdot c + b
+$(a+b) \cdot c + b\cdot c$ can be opened up to get $a\cdot c + b \cdot c + b
 \cdot c$, and then simplified to just $a \cdot c + b \cdot c$. Another
 example is simplifying $(a^*+a) + (a^*+ \ONE) + (a +\ONE)$ to just
-$a^*+a+\ONE$. Adding these more aggressive simplification rules helps us
+$a^*+a+\ONE$. Adding these more aggressive simplification rules help us
 to achieve the same size bound as that of the partial derivatives.
 In order to implement the idea of ``spilling out alternatives'' and to
-make them compatible with the $\text{inj}$-mechanism, we use
+make them compatible with the $\textit{inj}$-mechanism, we use
-\emph{bitcodes}. Bits and bitcodes (lists of bits) are just:
+\emph{bitcodes}. They were first introduced by Sulzmann and Lu.
+Here bits and bitcodes (lists of bits) are defined as:
 \begin{center}
 		$b ::=   S \mid  Z \qquad
 bs ::= [] \mid b:bs
 $
 \noindent
 In this definition $\_\backslash s$ is the  generalisation  of the derivative
 operation from characters to strings (just like the derivatives for un-annotated
 regular expressions).
+\subsection*{Our Simplification Rules}
 The main point of the bitcodes and annotated regular expressions is that
 we can apply rather aggressive (in terms of size) simplification rules
 in order to keep derivatives small. We have developed such
 ``aggressive'' simplification rules and generated test data that show

changeset 108	0a0c551bb368
parent 107	b1e365afa29c
child 109	79f347cb8b4d