MSc Projects

start of paragraph. a cyan word normal again something longer.

Description: Regular expressions are extremely useful for many text-processing tasks such as finding patterns in texts, lexing programs, syntax highlighting and so on. Given that regular expressions were introduced in 1950 by Stephen Kleene, you might think regular expressions have since been studied and implemented to death. But you would definitely be mistaken: in fact they are still an active research area. For example this paper about regular expression matching and partial derivatives was presented this summer at the international PPDP'12 conference. The task in this project is to implement the results from this paper.

The background for this project is that some regular expressions are evil and can stab you in the back; according to this blog post. For example, if you use in Python or in Ruby (probably also in other mainstream programming languages) the innocently looking regular expression a?{28}a{28} and match it, say, against the string aaaaaaaaaaaaaaaaaaaaaaaaaaaa (that is 28 as), you will soon notice that your CPU usage goes to 100%. In fact, Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself: re.py (Python version) and re.rb (Ruby version). You can imagine an attacker mounting a nice DoS attack against your program if it contains such an evil regular expression. Actually Scala (and also Java) are almost immune from such attacks as they can deal with strings of up to 4,300 as in less than a second. But if you scale the regular expression and string further to, say, 4,600 as, then you get a StackOverflowError potentially crashing your program.