etnms/etnms.tex
changeset 126 1260b383ae2c
parent 125 788f4aa28bc5
child 127 580e044af0f7
equal deleted inserted replaced
125:788f4aa28bc5 126:1260b383ae2c
   105   formally established their correctness. We have also not yet looked 
   105   formally established their correctness. We have also not yet looked 
   106   at extended regular expressions, such as bounded repetitions,
   106   at extended regular expressions, such as bounded repetitions,
   107   negation and back-references.
   107   negation and back-references.
   108 \end{abstract}
   108 \end{abstract}
   109 
   109 
       
   110 
       
   111 \section{Introduction}
       
   112 
       
   113 While we believe derivatives of regular expressions, written
       
   114 $r\backslash s$, are a beautiful concept (in terms of ease of
       
   115 implementing them in functional programming languages and in terms of
       
   116 reasoning about them formally), they have one major drawback: every
       
   117 derivative step can make regular expressions grow drastically in
       
   118 size. This in turn has negative effect on the runtime of the
       
   119 corresponding lexing algorithms. Consider for example the regular
       
   120 expression $(a+aa)^*$ and the short string $aaaaaaaaaaaa$. The
       
   121 corresponding derivative contains already 8668 nodes where we assume
       
   122 the derivative is given as a tree. The reason for the poor runtime of
       
   123 the derivative-based lexing algorithms is that they need to traverse
       
   124 such trees over and over again. The solution is to find a complete set
       
   125 of simplification rules that keep the sizes of derivatives uniformly
       
   126 small.
       
   127 
       
   128 
   110 \section{Recapitulation of Concepts From the Last Report}
   129 \section{Recapitulation of Concepts From the Last Report}
   111 
   130 
   112 \subsection*{Regular Expressions and Derivatives}
   131 \subsection*{Regular Expressions and Derivatives}
   113 
   132 
   114 Suppose (basic) regular expressions are given by the following grammar:
   133 Suppose (basic) regular expressions are given by the following grammar:
   733 will be reduced to just 6 and stays constant, no matter how long the
   752 will be reduced to just 6 and stays constant, no matter how long the
   734 input string is.
   753 input string is.
   735 
   754 
   736 
   755 
   737 
   756 
   738 \section{Introduction}
   757 \section{Current Work and Progress}
   739 
       
   740 While we believe derivatives of regular expressions, written
       
   741 $r\backslash s$, are a beautiful concept (in terms of ease of
       
   742 implementing them in functional programming languages and in terms of
       
   743 reasoning about them formally), they have one major drawback: every
       
   744 derivative step can make regular expressions grow drastically in
       
   745 size. This in turn has negative effect on the runtime of the
       
   746 corresponding lexing algorithms. Consider for example the regular
       
   747 expression $(a+aa)^*$ and the short string $aaaaaaaaaaaa$. The
       
   748 corresponding derivative contains already 8668 nodes where we assume
       
   749 the derivative is given as a tree. The reason for the poor runtime of
       
   750 the derivative-based lexing algorithms is that they need to traverse
       
   751 such trees over and over again. The solution is to find a complete set
       
   752 of simplification rules that keep the sizes of derivatives uniformly
       
   753 small.
       
   754 
       
   755 For reasons beyond this report, it turns out that a complete set of
   758 For reasons beyond this report, it turns out that a complete set of
   756 simplification rules depends on values being encoded as
   759 simplification rules depends on values being encoded as
   757 bitsequences.\footnote{Values are the results the lexing algorithms
   760 bitsequences.\footnote{Values are the results the lexing algorithms
   758   generate; they encode how a regular expression matched a string.} We
   761   generate; they encode how a regular expression matched a string.} We
   759 already know that the lexing algorithm using bitsequences but
   762 already know that the lexing algorithm using bitsequences but