test.html
author Christian Urban <christian dot urban at kcl dot ac dot uk>
Wed, 21 Nov 2012 02:29:08 +0000
changeset 67 1419b60f6b0e
parent 64 2d625418c011
permissions -rw-r--r--
updated
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
64
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     1
<b>MSc Projects</b>
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     2
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     3
<p>
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     4
start of paragraph. <cyan> a <red>cyan</red> word</cyan> normal again something longer.
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     5
</p> 
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     6
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     7
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     8
 <p><b>Description:</b>  
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     9
  <a>Regular expressions</a> are extremely useful for many text-processing tasks such as finding patterns in texts,
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    10
  lexing programs, syntax highlighting and so on. Given that regular expressions were
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    11
  introduced in 1950 by <a>Stephen Kleene</a>, you might think 
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    12
  regular expressions have since been studied and implemented to death. But you would definitely be mistaken: in fact they are still
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    13
  an active research area. For example
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    14
  <a>this paper</a> 
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    15
  about regular expression matching and partial derivatives was presented this summer at the international 
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    16
  PPDP'12 conference. The task in this project is to implement the results from this paper.</p>
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    17
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    18
  <p>The background for this project is that some regular expressions are 
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    19
  <a>evil</a>
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    20
  and can stab you in the back; according to
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    21
  this <a>blog post</a>.
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    22
  For example, if you use in <a>Python</a> or 
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    23
  in <a>Ruby</a> (probably also in other mainstream programming languages) the 
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    24
  innocently looking regular expression a?{28}a{28} and match it, say, against the string 
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    25
  <red>aaaaaaaaaaaaaaaaaaaaaaaaaaaa</red> (that is 28 as), you will soon notice that your CPU usage goes to 100%. In fact,
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    26
  Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself:
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    27
  <a>re.py</a> (Python version) and 
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    28
  <a>re.rb</a> 
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    29
  (Ruby version). You can imagine an attacker
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    30
  mounting a nice <a>DoS attack</a> against 
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    31
  your program if it contains such an evil regular expression. Actually 
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    32
  <a>Scala</a> (and also Java) are almost immune from such
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    33
  attacks as they can deal with strings of up to 4,300 as in less than a second. But if you scale
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    34
  the regular expression and string further to, say, 4,600 as, then you get a 
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    35
  StackOverflowError 
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    36
  potentially crashing your program.
2d625418c011 added everything
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    37
  </p>