# HG changeset patch # User Chengsong # Date 1581197825 0 # Node ID 1260b383ae2ca460109fef1597cc165f4036be05 # Parent 788f4aa28bc5e12f15feaad6f370aec68b7ba312 forfear diff -r 788f4aa28bc5 -r 1260b383ae2c etnms/etnms.tex --- a/etnms/etnms.tex Sat Feb 08 21:34:50 2020 +0000 +++ b/etnms/etnms.tex Sat Feb 08 21:37:05 2020 +0000 @@ -107,6 +107,25 @@ negation and back-references. \end{abstract} + +\section{Introduction} + +While we believe derivatives of regular expressions, written +$r\backslash s$, are a beautiful concept (in terms of ease of +implementing them in functional programming languages and in terms of +reasoning about them formally), they have one major drawback: every +derivative step can make regular expressions grow drastically in +size. This in turn has negative effect on the runtime of the +corresponding lexing algorithms. Consider for example the regular +expression $(a+aa)^*$ and the short string $aaaaaaaaaaaa$. The +corresponding derivative contains already 8668 nodes where we assume +the derivative is given as a tree. The reason for the poor runtime of +the derivative-based lexing algorithms is that they need to traverse +such trees over and over again. The solution is to find a complete set +of simplification rules that keep the sizes of derivatives uniformly +small. + + \section{Recapitulation of Concepts From the Last Report} \subsection*{Regular Expressions and Derivatives} @@ -735,23 +754,7 @@ -\section{Introduction} - -While we believe derivatives of regular expressions, written -$r\backslash s$, are a beautiful concept (in terms of ease of -implementing them in functional programming languages and in terms of -reasoning about them formally), they have one major drawback: every -derivative step can make regular expressions grow drastically in -size. This in turn has negative effect on the runtime of the -corresponding lexing algorithms. Consider for example the regular -expression $(a+aa)^*$ and the short string $aaaaaaaaaaaa$. The -corresponding derivative contains already 8668 nodes where we assume -the derivative is given as a tree. The reason for the poor runtime of -the derivative-based lexing algorithms is that they need to traverse -such trees over and over again. The solution is to find a complete set -of simplification rules that keep the sizes of derivatives uniformly -small. - +\section{Current Work and Progress} For reasons beyond this report, it turns out that a complete set of simplification rules depends on values being encoded as bitsequences.\footnote{Values are the results the lexing algorithms