# HG changeset patch # User Christian Urban # Date 1579305188 0 # Node ID 0a0c551bb3688bfce97568453296d6da945c89c6 # Parent b1e365afa29c558f66e3cd964b085b7fdce38fc5 updated diff -r b1e365afa29c -r 0a0c551bb368 etnms/etnms.tex --- a/etnms/etnms.tex Thu Jan 16 22:34:23 2020 +0000 +++ b/etnms/etnms.tex Fri Jan 17 23:53:08 2020 +0000 @@ -102,10 +102,11 @@ \end{abstract} \section{Recapitulation of Concepts From the Last Report} -\subsection{The Algorithm by Brzozowski based on Derivatives of Regular -Expressions} + +\subsection*{Regular Expressions and Derivatives} Suppose (basic) regular expressions are given by the following grammar: + \[ r ::= \ZERO \mid \ONE \mid c \mid r_1 \cdot r_2 @@ -114,9 +115,10 @@ \] \noindent +The ingenious contribution of Brzozowski is the notion of \emph{derivatives} of +regular expressions, written~$\_ \backslash \_$. It uses the auxiliary notion of +$\nullable$ defined below. -The ingenious contribution by Brzozowski is the notion of -\emph{derivatives} of regular expressions. \begin{center} \begin{tabular}{lcl} $\nullable(\ZERO)$ & $\dn$ & $\mathit{false}$ \\ @@ -154,9 +156,8 @@ \end{center} \noindent - - -Now we can generalise the derivative operation to strings like this: +We can generalise the derivative operation shown above for single characters +to strings as follows: \begin{center} \begin{tabular}{lcl} @@ -166,13 +167,16 @@ \end{center} \noindent -and then define as regular-expression matching algorithm: +and then define Brzozowski's regular-expression matching algorithm as: + \[ match\;s\;r \;\dn\; nullable(r\backslash s) \] \noindent -This algorithm looks graphically as follows: +Assuming the a string is givane as a sequence of characters, say $c_0c_1..c_n$, +this algorithm presented graphically is as follows: + \begin{equation}\label{graph:*} \begin{tikzcd} r_0 \arrow[r, "\backslash c_0"] & r_1 \arrow[r, "\backslash c_1"] & r_2 \arrow[r, dashed] & r_n \arrow[r,"\textit{nullable}?"] & \;\textrm{YES}/\textrm{NO} @@ -187,7 +191,7 @@ an $s = c_0...c_{n-1}$ and an $r_0$, it generates YES if and only if $s \in L(r_0)$). -\subsection{Values and the Algorithm by Sulzmann and Lu} +\subsection*{Values and the Lexing Algorithm by Sulzmann and Lu} One limitation of Brzozowski's algorithm is that it only produces a YES/NO answer for whether a string is being matched by a regular @@ -225,7 +229,6 @@ \end{center} \noindent - The contribution of Sulzmann and Lu is an extension of Brzozowski's algorithm by a second phase (the first phase being building successive derivatives---see \eqref{graph:*}). In this second phase, a POSIX value @@ -302,7 +305,7 @@ expressions and values. -\subsection{Simplification of Regular Expressions} +\subsection*{Simplification of Regular Expressions} The main drawback of building successive derivatives according to Brzozowski's definition is that they can grow very quickly in size. @@ -331,16 +334,16 @@ stay below this bound, we would need more aggressive simplifications. Essentially we need to delete useless $\ZERO$s and $\ONE$s, as well as deleting duplicates whenever possible. For example, the parentheses in -$(a+b) \cdot c + bc$ can be opened up to get $a\cdot c + b \cdot c + b +$(a+b) \cdot c + b\cdot c$ can be opened up to get $a\cdot c + b \cdot c + b \cdot c$, and then simplified to just $a \cdot c + b \cdot c$. Another example is simplifying $(a^*+a) + (a^*+ \ONE) + (a +\ONE)$ to just -$a^*+a+\ONE$. Adding these more aggressive simplification rules helps us +$a^*+a+\ONE$. Adding these more aggressive simplification rules help us to achieve the same size bound as that of the partial derivatives. In order to implement the idea of ``spilling out alternatives'' and to -make them compatible with the $\text{inj}$-mechanism, we use -\emph{bitcodes}. Bits and bitcodes (lists of bits) are just: - +make them compatible with the $\textit{inj}$-mechanism, we use +\emph{bitcodes}. They were first introduced by Sulzmann and Lu. +Here bits and bitcodes (lists of bits) are defined as: \begin{center} $b ::= S \mid Z \qquad @@ -574,6 +577,9 @@ operation from characters to strings (just like the derivatives for un-annotated regular expressions). + +\subsection*{Our Simplification Rules} + The main point of the bitcodes and annotated regular expressions is that we can apply rather aggressive (in terms of size) simplification rules in order to keep derivatives small. We have developed such