diff -r e6868bd2942b -r 7717f20f0504 html.scala --- a/html.scala Wed Nov 21 09:04:11 2012 +0000 +++ b/html.scala Fri Nov 23 14:08:31 2012 +0000 @@ -56,4 +56,44 @@ case _::rest => interpret(rest, c, ctr) } -interpret(T.fromFile("test.html"), 0, Nil) +val test_string = """ +MSc Projects + +

+start of paragraph. a cyan word normal again something longer. +

+ + +

Description: + Regular expressions are extremely useful for many text-processing tasks such as finding patterns in texts, + lexing programs, syntax highlighting and so on. Given that regular expressions were + introduced in 1950 by Stephen Kleene, you might think + regular expressions have since been studied and implemented to death. But you would definitely be mistaken: in fact they are still + an active research area. For example + this paper + about regular expression matching and partial derivatives was presented this summer at the international + PPDP'12 conference. The task in this project is to implement the results from this paper.

+ +

The background for this project is that some regular expressions are + evil + and can stab you in the back; according to + this blog post. + For example, if you use in Python or + in Ruby (probably also in other mainstream programming languages) the + innocently looking regular expression a?{28}a{28} and match it, say, against the string + aaaaaaaaaaaaaaaaaaaaaaaaaaaa (that is 28 as), you will soon notice that your CPU usage goes to 100%. In fact, + Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself: + re.py (Python version) and + re.rb + (Ruby version). You can imagine an attacker + mounting a nice DoS attack against + your program if it contains such an evil regular expression. Actually + Scala (and also Java) are almost immune from such + attacks as they can deal with strings of up to 4,300 as in less than a second. But if you scale + the regular expression and string further to, say, 4,600 as, then you get a + StackOverflowError + potentially crashing your program. +

+""" + +interpret(T.fromString(test_string), 0, Nil)