msc-projects-14.html
changeset 334 3238418b70ca
parent 333 846966afdad1
child 335 7338cef94b2b
equal deleted inserted replaced
333:846966afdad1 334:3238418b70ca
    47 <li> <H4>[CU1] Regular Expression Matching and Derivatives</H4>
    47 <li> <H4>[CU1] Regular Expression Matching and Derivatives</H4>
    48 
    48 
    49   <p>
    49   <p>
    50   <B>Description:</b>  
    50   <B>Description:</b>  
    51   <A HREF="http://en.wikipedia.org/wiki/Regular_expression">Regular expressions</A> 
    51   <A HREF="http://en.wikipedia.org/wiki/Regular_expression">Regular expressions</A> 
    52   are extremely useful for many text-processing tasks such as finding patterns in texts,
    52   are extremely useful for many text-processing tasks, such as finding patterns in texts,
    53   lexing programs, syntax highlighting and so on. Given that regular expressions were
    53   lexing programs, syntax highlighting and so on. Given that regular expressions were
    54   introduced in 1950 by <A HREF="http://en.wikipedia.org/wiki/Stephen_Cole_Kleene">Stephen Kleene</A>,
    54   introduced in 1950 by <A HREF="http://en.wikipedia.org/wiki/Stephen_Cole_Kleene">Stephen Kleene</A>,
    55   you might think regular expressions have since been studied and implemented to death. But you would definitely be
    55   you might think regular expressions have since been studied and implemented to death. But you would definitely be
    56   mistaken: in fact they are still an active research area. For example
    56   mistaken: in fact they are still an active research area. For example
    57   <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/regex-parsing-derivatives.pdf">this paper</A> 
    57   <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/regex-parsing-derivatives.pdf">this paper</A> 
    58   about regular expression matching and derivatives was presented just last summer at the international 
    58   about regular expression matching and derivatives was presented just last summer at the international 
    59   FLOPS'14 conference. The task in this project is to implement their results.</p>
    59   FLOPS'14 conference. The task in this project is to implement their results and use them for lexing.</p>
    60 
    60 
    61   <p>The background for this project is that some regular expressions are 
    61   <p>The background for this project is that some regular expressions are 
    62   &ldquo;<A HREF="http://en.wikipedia.org/wiki/ReDoS#Examples">evil</A>&rdquo;
    62   &ldquo;<A HREF="http://en.wikipedia.org/wiki/ReDoS#Examples">evil</A>&rdquo;
    63   and can &ldquo;stab you in the back&rdquo; according to
    63   and can &ldquo;stab you in the back&rdquo; according to
    64   this <A HREF="http://peterscott.github.io/2013/01/17/regular-expressions-will-stab-you-in-the-back/">blog post</A>.
    64   this <A HREF="http://peterscott.github.io/2013/01/17/regular-expressions-will-stab-you-in-the-back/">blog post</A>.
   104   <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/regex-parsing-derivatives.pdf">FLOPS'14-paper</A> mentioned 
   104   <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/regex-parsing-derivatives.pdf">FLOPS'14-paper</A> mentioned 
   105   above claim they are even faster than me and can deal with even more features of regular expressions
   105   above claim they are even faster than me and can deal with even more features of regular expressions
   106   (for example subexpression matching, which my rainy-afternoon matcher cannot). I am sure they thought
   106   (for example subexpression matching, which my rainy-afternoon matcher cannot). I am sure they thought
   107   about the problem much longer than a single afternoon. The task 
   107   about the problem much longer than a single afternoon. The task 
   108   in this project is to find out how good they actually are by implementing the results from their paper. 
   108   in this project is to find out how good they actually are by implementing the results from their paper. 
   109   Their approach is based on the concept of derivatives.
   109   Their approach to regular expression matching is also based on the concept of derivatives.
   110   I used them once myself in a <A HREF="http://www.inf.kcl.ac.uk/staff/urbanc/Publications/rexp.pdf">paper</A> 
   110   I used derivatives very successfully once for something completely different in a
   111   in order to prove the <A HREF="http://en.wikipedia.org/wiki/Myhill–Nerode_theorem">Myhill-Nerode theorem</A>.
   111   <A HREF="http://www.inf.kcl.ac.uk/staff/urbanc/Publications/rexp.pdf">paper</A> 
       
   112   about the <A HREF="http://en.wikipedia.org/wiki/Myhill–Nerode_theorem">Myhill-Nerode theorem</A>.
   112   So I know they are worth their money. Still, it would be interesting to actually compare their results
   113   So I know they are worth their money. Still, it would be interesting to actually compare their results
   113   with my simple rainy-afternoon matcher and potentially &ldquo;blow away&rdquo; the regular expression matchers 
   114   with my simple rainy-afternoon matcher and potentially &ldquo;blow away&rdquo; the regular expression matchers 
   114   in Python and Ruby (and possibly in Scala too). The application would be to implement a fast lexer for
   115   in Python and Ruby (and possibly in Scala too). The application would be to implement a fast lexer for
   115   programming languages. 
   116   programming languages. 
   116   </p>
   117   </p>
   621 </TD>
   622 </TD>
   622 </TR>
   623 </TR>
   623 </TABLE>
   624 </TABLE>
   624 
   625 
   625 <P>
   626 <P>
   626 <!-- hhmts start --> Last modified: Sun Nov  9 21:37:30 GMT 2014 <!-- hhmts end -->
   627 <!-- hhmts start --> Last modified: Sun Nov  9 21:47:12 GMT 2014 <!-- hhmts end -->
   627 <a href="http://validator.w3.org/check/referer">[Validate this page.]</a>
   628 <a href="http://validator.w3.org/check/referer">[Validate this page.]</a>
   628 </BODY>
   629 </BODY>
   629 </HTML>
   630 </HTML>