47 <li> <H4>[CU1] Regular Expression Matching and Derivatives</H4> |
47 <li> <H4>[CU1] Regular Expression Matching and Derivatives</H4> |
48 |
48 |
49 <p> |
49 <p> |
50 <B>Description:</b> |
50 <B>Description:</b> |
51 <A HREF="http://en.wikipedia.org/wiki/Regular_expression">Regular expressions</A> |
51 <A HREF="http://en.wikipedia.org/wiki/Regular_expression">Regular expressions</A> |
52 are extremely useful for many text-processing tasks such as finding patterns in texts, |
52 are extremely useful for many text-processing tasks, such as finding patterns in texts, |
53 lexing programs, syntax highlighting and so on. Given that regular expressions were |
53 lexing programs, syntax highlighting and so on. Given that regular expressions were |
54 introduced in 1950 by <A HREF="http://en.wikipedia.org/wiki/Stephen_Cole_Kleene">Stephen Kleene</A>, |
54 introduced in 1950 by <A HREF="http://en.wikipedia.org/wiki/Stephen_Cole_Kleene">Stephen Kleene</A>, |
55 you might think regular expressions have since been studied and implemented to death. But you would definitely be |
55 you might think regular expressions have since been studied and implemented to death. But you would definitely be |
56 mistaken: in fact they are still an active research area. For example |
56 mistaken: in fact they are still an active research area. For example |
57 <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/regex-parsing-derivatives.pdf">this paper</A> |
57 <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/regex-parsing-derivatives.pdf">this paper</A> |
58 about regular expression matching and derivatives was presented just last summer at the international |
58 about regular expression matching and derivatives was presented just last summer at the international |
59 FLOPS'14 conference. The task in this project is to implement their results.</p> |
59 FLOPS'14 conference. The task in this project is to implement their results and use them for lexing.</p> |
60 |
60 |
61 <p>The background for this project is that some regular expressions are |
61 <p>The background for this project is that some regular expressions are |
62 “<A HREF="http://en.wikipedia.org/wiki/ReDoS#Examples">evil</A>” |
62 “<A HREF="http://en.wikipedia.org/wiki/ReDoS#Examples">evil</A>” |
63 and can “stab you in the back” according to |
63 and can “stab you in the back” according to |
64 this <A HREF="http://peterscott.github.io/2013/01/17/regular-expressions-will-stab-you-in-the-back/">blog post</A>. |
64 this <A HREF="http://peterscott.github.io/2013/01/17/regular-expressions-will-stab-you-in-the-back/">blog post</A>. |
104 <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/regex-parsing-derivatives.pdf">FLOPS'14-paper</A> mentioned |
104 <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/regex-parsing-derivatives.pdf">FLOPS'14-paper</A> mentioned |
105 above claim they are even faster than me and can deal with even more features of regular expressions |
105 above claim they are even faster than me and can deal with even more features of regular expressions |
106 (for example subexpression matching, which my rainy-afternoon matcher cannot). I am sure they thought |
106 (for example subexpression matching, which my rainy-afternoon matcher cannot). I am sure they thought |
107 about the problem much longer than a single afternoon. The task |
107 about the problem much longer than a single afternoon. The task |
108 in this project is to find out how good they actually are by implementing the results from their paper. |
108 in this project is to find out how good they actually are by implementing the results from their paper. |
109 Their approach is based on the concept of derivatives. |
109 Their approach to regular expression matching is also based on the concept of derivatives. |
110 I used them once myself in a <A HREF="http://www.inf.kcl.ac.uk/staff/urbanc/Publications/rexp.pdf">paper</A> |
110 I used derivatives very successfully once for something completely different in a |
111 in order to prove the <A HREF="http://en.wikipedia.org/wiki/Myhill–Nerode_theorem">Myhill-Nerode theorem</A>. |
111 <A HREF="http://www.inf.kcl.ac.uk/staff/urbanc/Publications/rexp.pdf">paper</A> |
|
112 about the <A HREF="http://en.wikipedia.org/wiki/Myhill–Nerode_theorem">Myhill-Nerode theorem</A>. |
112 So I know they are worth their money. Still, it would be interesting to actually compare their results |
113 So I know they are worth their money. Still, it would be interesting to actually compare their results |
113 with my simple rainy-afternoon matcher and potentially “blow away” the regular expression matchers |
114 with my simple rainy-afternoon matcher and potentially “blow away” the regular expression matchers |
114 in Python and Ruby (and possibly in Scala too). The application would be to implement a fast lexer for |
115 in Python and Ruby (and possibly in Scala too). The application would be to implement a fast lexer for |
115 programming languages. |
116 programming languages. |
116 </p> |
117 </p> |