author | Christian Urban <christian dot urban at kcl dot ac dot uk> |
Wed, 21 Nov 2012 02:29:08 +0000 | |
changeset 67 | 1419b60f6b0e |
parent 64 | 2d625418c011 |
permissions | -rw-r--r-- |
64
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
1 |
<b>MSc Projects</b> |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
2 |
|
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
3 |
<p> |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
4 |
start of paragraph. <cyan> a <red>cyan</red> word</cyan> normal again something longer. |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
5 |
</p> |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
6 |
|
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
7 |
|
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
8 |
<p><b>Description:</b> |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
9 |
<a>Regular expressions</a> are extremely useful for many text-processing tasks such as finding patterns in texts, |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
10 |
lexing programs, syntax highlighting and so on. Given that regular expressions were |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
11 |
introduced in 1950 by <a>Stephen Kleene</a>, you might think |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
12 |
regular expressions have since been studied and implemented to death. But you would definitely be mistaken: in fact they are still |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
13 |
an active research area. For example |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
14 |
<a>this paper</a> |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
15 |
about regular expression matching and partial derivatives was presented this summer at the international |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
16 |
PPDP'12 conference. The task in this project is to implement the results from this paper.</p> |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
17 |
|
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
18 |
<p>The background for this project is that some regular expressions are |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
19 |
<a>evil</a> |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
20 |
and can stab you in the back; according to |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
21 |
this <a>blog post</a>. |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
22 |
For example, if you use in <a>Python</a> or |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
23 |
in <a>Ruby</a> (probably also in other mainstream programming languages) the |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
24 |
innocently looking regular expression a?{28}a{28} and match it, say, against the string |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
25 |
<red>aaaaaaaaaaaaaaaaaaaaaaaaaaaa</red> (that is 28 as), you will soon notice that your CPU usage goes to 100%. In fact, |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
26 |
Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself: |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
27 |
<a>re.py</a> (Python version) and |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
28 |
<a>re.rb</a> |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
29 |
(Ruby version). You can imagine an attacker |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
30 |
mounting a nice <a>DoS attack</a> against |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
31 |
your program if it contains such an evil regular expression. Actually |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
32 |
<a>Scala</a> (and also Java) are almost immune from such |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
33 |
attacks as they can deal with strings of up to 4,300 as in less than a second. But if you scale |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
34 |
the regular expression and string further to, say, 4,600 as, then you get a |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
35 |
StackOverflowError |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
36 |
potentially crashing your program. |
2d625418c011
added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
37 |
</p> |