| author | Christian Urban <christian dot urban at kcl dot ac dot uk> | 
| Wed, 21 Nov 2012 02:34:37 +0000 | |
| changeset 68 | a09ecb1e7384 | 
| parent 64 | 2d625418c011 | 
| permissions | -rw-r--r-- | 
| 64 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 1 | <b>MSc Projects</b> | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 2 | |
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 3 | <p> | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 4 | start of paragraph. <cyan> a <red>cyan</red> word</cyan> normal again something longer. | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 5 | </p> | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 6 | |
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 7 | |
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 8 | <p><b>Description:</b> | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 9 | <a>Regular expressions</a> are extremely useful for many text-processing tasks such as finding patterns in texts, | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 10 | lexing programs, syntax highlighting and so on. Given that regular expressions were | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 11 | introduced in 1950 by <a>Stephen Kleene</a>, you might think | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 12 | regular expressions have since been studied and implemented to death. But you would definitely be mistaken: in fact they are still | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 13 | an active research area. For example | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 14 | <a>this paper</a> | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 15 | about regular expression matching and partial derivatives was presented this summer at the international | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 16 | PPDP'12 conference. The task in this project is to implement the results from this paper.</p> | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 17 | |
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 18 | <p>The background for this project is that some regular expressions are | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 19 | <a>evil</a> | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 20 | and can stab you in the back; according to | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 21 | this <a>blog post</a>. | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 22 | For example, if you use in <a>Python</a> or | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 23 | in <a>Ruby</a> (probably also in other mainstream programming languages) the | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 24 |   innocently looking regular expression a?{28}a{28} and match it, say, against the string 
 | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 25 | <red>aaaaaaaaaaaaaaaaaaaaaaaaaaaa</red> (that is 28 as), you will soon notice that your CPU usage goes to 100%. In fact, | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 26 | Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself: | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 27 | <a>re.py</a> (Python version) and | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 28 | <a>re.rb</a> | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 29 | (Ruby version). You can imagine an attacker | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 30 | mounting a nice <a>DoS attack</a> against | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 31 | your program if it contains such an evil regular expression. Actually | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 32 | <a>Scala</a> (and also Java) are almost immune from such | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 33 | attacks as they can deal with strings of up to 4,300 as in less than a second. But if you scale | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 34 | the regular expression and string further to, say, 4,600 as, then you get a | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 35 | StackOverflowError | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 36 | potentially crashing your program. | 
| 
2d625418c011
added everything
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 37 | </p> |