bsc-projects-16.html
changeset 457 3feaf8bc3e48
parent 456 9eea20ad0caf
child 458 0647d8161a84
equal deleted inserted replaced
456:9eea20ad0caf 457:3feaf8bc3e48
     1 <?xml version="1.0" encoding="utf-8"?>
     1 <?xml version="1.0" encoding="utf-8"?>
     2 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
     2 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
     3 <HEAD>
     3 <HEAD>
     4 <TITLE>2015/16 MSc Projects</TITLE>
     4 <TITLE>2016/17 BSc Projects</TITLE>
     5 <BASE HREF="http://www.inf.kcl.ac.uk/staff/urbanc/">
     5 <BASE HREF="http://www.inf.kcl.ac.uk/staff/urbanc/">
     6 <script type="text/javascript" src="striper.js"></script>
     6 <script type="text/javascript" src="striper.js"></script>
     7 <link rel="stylesheet" href="nominal.css">
     7 <link rel="stylesheet" href="nominal.css">
     8 <script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
     8 <script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
     9 </script>
     9 </script>
    33 <H2>2015/16 MSc Projects</H2>
    33 <H2>2015/16 MSc Projects</H2>
    34 <H4>Supervisor: Christian Urban</H4> 
    34 <H4>Supervisor: Christian Urban</H4> 
    35 <H4>Email: christian dot urban at kcl dot ac dot uk,  Office: Strand Building S1.27</H4>
    35 <H4>Email: christian dot urban at kcl dot ac dot uk,  Office: Strand Building S1.27</H4>
    36 <H4>If you are interested in a project, please send me an email and we can discuss details. Please include
    36 <H4>If you are interested in a project, please send me an email and we can discuss details. Please include
    37 a short description about your programming skills and Computer Science background in your first email. 
    37 a short description about your programming skills and Computer Science background in your first email. 
    38 I will also need your King's username in order to book the project for you. Thanks.</H4> 
    38 Thanks.</H4> 
    39 
    39 
    40 <H4>Note that besides being a lecturer at the theoretical end of Computer Science, I am also a passionate
    40 <H4>Note that besides being a lecturer at the theoretical end of Computer Science, I am also a passionate
    41     <A HREF="http://en.wikipedia.org/wiki/Hacker_(programmer_subculture)">hacker</A> &hellip;
    41     <A HREF="http://en.wikipedia.org/wiki/Hacker_(programmer_subculture)">hacker</A> &hellip;
    42     defined as &ldquo;a person who enjoys exploring the details of programmable systems and 
    42     defined as &ldquo;a person who enjoys exploring the details of programmable systems and 
    43     stretching their capabilities, as opposed to most users, who prefer to learn only the minimum 
    43     stretching their capabilities, as opposed to most users, who prefer to learn only the minimum 
    56   <A HREF="http://en.wikipedia.org/wiki/Regular_expression">Regular expressions</A> 
    56   <A HREF="http://en.wikipedia.org/wiki/Regular_expression">Regular expressions</A> 
    57   are extremely useful for many text-processing tasks, such as finding patterns in texts,
    57   are extremely useful for many text-processing tasks, such as finding patterns in texts,
    58   lexing programs, syntax highlighting and so on. Given that regular expressions were
    58   lexing programs, syntax highlighting and so on. Given that regular expressions were
    59   introduced in 1950 by <A HREF="http://en.wikipedia.org/wiki/Stephen_Cole_Kleene">Stephen Kleene</A>,
    59   introduced in 1950 by <A HREF="http://en.wikipedia.org/wiki/Stephen_Cole_Kleene">Stephen Kleene</A>,
    60   you might think regular expressions have since been studied and implemented to death. But you would definitely be
    60   you might think regular expressions have since been studied and implemented to death. But you would definitely be
    61   mistaken: in fact they are still an active research area. For example
    61   mistaken: in fact they are still an active research area. On the top of my head, I can give
       
    62   you at least research papers that appeared in the last few years.
       
    63   For example
    62   <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/regex-parsing-derivatives.pdf">this paper</A> 
    64   <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/regex-parsing-derivatives.pdf">this paper</A> 
    63   about regular expression matching and derivatives was presented just last summer at the international 
    65   about regular expression matching and derivatives was presented just last summer at the international 
    64   FLOPS'14 conference. The task in this project is to implement their results and use them for lexing.</p>
    66   FLOPS'14 conference. The task in this project is to implement their results and use them for lexing.</p>
    65 
    67 
    66   <p>The background for this project is that some regular expressions are 
    68   <p>The background for this project is that some regular expressions are 
    70   For example, if you use in <A HREF="http://www.python.org">Python</A> or 
    72   For example, if you use in <A HREF="http://www.python.org">Python</A> or 
    71   in <A HREF="http://www.ruby-lang.org/en/">Ruby</A> (or also in a number of other mainstream programming languages) the 
    73   in <A HREF="http://www.ruby-lang.org/en/">Ruby</A> (or also in a number of other mainstream programming languages) the 
    72   innocently looking regular expression <code>a?{28}a{28}</code> and match it, say, against the string 
    74   innocently looking regular expression <code>a?{28}a{28}</code> and match it, say, against the string 
    73   <code>aaaaaaaaaaaaaaaaaaaaaaaaaaaa</code> (that is 28 <code>a</code>s), you will soon notice that your CPU usage goes to 100%. In fact,
    75   <code>aaaaaaaaaaaaaaaaaaaaaaaaaaaa</code> (that is 28 <code>a</code>s), you will soon notice that your CPU usage goes to 100%. In fact,
    74   Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself:
    76   Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself:
    75   <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/progs/re.py">re.py</A> (Python version) and 
    77   <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/progs/catastrophic.py">catastrophic.py</A> (Python version) and 
    76   <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/progs/re.rb">re.rb</A> 
    78   <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/progs/catastrophic.rb">catastrophic.rb</A> 
    77   (Ruby version). You can imagine an attacker
    79   (Ruby version). Here is a similar problem in Java: <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/progs/catastrophic.rb">catastrophic.java</A>
       
    80   </p> 
       
    81 
       
    82   <p>
       
    83   You can imagine an attacker
    78   mounting a nice <A HREF="http://en.wikipedia.org/wiki/Denial-of-service_attack">DoS attack</A> against 
    84   mounting a nice <A HREF="http://en.wikipedia.org/wiki/Denial-of-service_attack">DoS attack</A> against 
    79   your program if it contains such an &ldquo;evil&rdquo; regular expression. Actually 
    85   your program if it contains such an &ldquo;evil&rdquo; regular expression. But it can also happen by accident:
    80   <A HREF="http://www.scala-lang.org/">Scala</A> (and also Java) are almost immune from such
    86   on 20 July 2016 the website <A HREF="http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016">Stack Exchange</A>
    81   attacks as they can deal with strings of up to 4,300 <code>a</code>s in less than a second. But if you scale
    87   was knocked offline because of an evil regular expression. One of their engineers talks about it in this
       
    88   <A HREF="https://vimeo.com/112065252">video</A>. A similar problem needed to be fixed in the
       
    89   <A HREF="http://davidvgalbraith.com/how-i-fixed-atom/">Atom</A> editor.
       
    90   A few implementations of regular expression matchers are almost immune from such problems.
       
    91   For example, <A HREF="http://www.scala-lang.org/">Scala</A> can deal with strings of up to 4,300 <code>a</code>s in less than a second. But if you scale
    82   the regular expression and string further to, say, 4,600 <code>a</code>s, then you get a <code>StackOverflowError</code> 
    92   the regular expression and string further to, say, 4,600 <code>a</code>s, then you get a <code>StackOverflowError</code> 
    83   potentially crashing your program. Moreover (beside the "minor" problem of being painfully slow) according to this
    93   potentially crashing your program. Moreover (beside the "minor" problem of being painfully slow) according to this
    84   <A HREF="http://www.haskell.org/haskellwiki/Regex_Posix">report</A>
    94   <A HREF="http://www.haskell.org/haskellwiki/Regex_Posix">report</A>
    85   nearly all regular expression matchers using the POSIX rules are actually buggy.
    95   nearly all regular expression matchers using the POSIX rules are actually buggy.
    86   </p>
    96   </p>
    94   official matcher maxes out at 4,600 <code>a</code>s). My matcher is approximately
   104   official matcher maxes out at 4,600 <code>a</code>s). My matcher is approximately
    95   85 lines of code and based on the concept of 
   105   85 lines of code and based on the concept of 
    96   <A HREF="http://lambda-the-ultimate.org/node/2293">derivatives of regular expressions</A>.
   106   <A HREF="http://lambda-the-ultimate.org/node/2293">derivatives of regular expressions</A>.
    97   These derivatives were introduced in 1964 by <A HREF="http://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)">
   107   These derivatives were introduced in 1964 by <A HREF="http://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)">
    98   Janusz Brzozowski</A>, but according to this
   108   Janusz Brzozowski</A>, but according to this
    99   <A HREF="http://www.cl.cam.ac.uk/~so294/documents/jfp09.pdf">paper</A> had been lost in the &ldquo;sands of time&rdquo;.
   109   <A HREF="https://www.cs.kent.ac.uk/people/staff/sao/documents/jfp09.pdf">paper</A> had been lost in the &ldquo;sands of time&rdquo;.
   100   The advantage of derivatives is that they side-step completely the usual 
   110   The advantage of derivatives is that they side-step completely the usual 
   101   <A HREF="http://hackingoff.com/compilers/regular-expression-to-nfa-dfa">translations</A> of regular expressions
   111   <A HREF="http://hackingoff.com/compilers/regular-expression-to-nfa-dfa">translations</A> of regular expressions
   102   into NFAs or DFAs, which can introduce the exponential behaviour exhibited by the regular
   112   into NFAs or DFAs, which can introduce the exponential behaviour exhibited by the regular
   103   expression matchers in Python and Ruby.
   113   expression matchers in Python and Ruby.
   104   </p>
   114   </p>
   122 
   132 
   123   <p>
   133   <p>
   124   <B>Literature:</B> 
   134   <B>Literature:</B> 
   125   The place to start with this project is obviously this
   135   The place to start with this project is obviously this
   126   <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/regex-parsing-derivatives.pdf">paper</A>
   136   <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/regex-parsing-derivatives.pdf">paper</A>
   127   and this <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/ppdp12-part-deriv-sub-match.pdf">one</A>.
   137   and this <A HREF="http://www.inf.kcl.ac.uk/staff/urbanc/Publications/posix.pdf">one</A>.
   128   Traditional methods for regular expression matching are explained
   138   Traditional methods for regular expression matching are explained
   129   in the Wikipedia articles 
   139   in the Wikipedia articles 
   130   <A HREF="http://en.wikipedia.org/wiki/DFA_minimization">here</A> and 
   140   <A HREF="http://en.wikipedia.org/wiki/DFA_minimization">here</A> and 
   131   <A HREF="http://en.wikipedia.org/wiki/Powerset_construction">here</A>.
   141   <A HREF="http://en.wikipedia.org/wiki/Powerset_construction">here</A>.
   132   The authoritative <A HREF="http://infolab.stanford.edu/~ullman/ialc.html">book</A>
   142   The authoritative <A HREF="http://infolab.stanford.edu/~ullman/ialc.html">book</A>
   148   in functional languages like
   158   in functional languages like
   149   <A HREF="http://www.scala-lang.org/">Scala</A>,
   159   <A HREF="http://www.scala-lang.org/">Scala</A>,
   150   <A HREF="http://fsharp.org">F#</A>, 
   160   <A HREF="http://fsharp.org">F#</A>, 
   151   <A HREF="http://en.wikipedia.org/wiki/Standard_ML">ML</A>,  
   161   <A HREF="http://en.wikipedia.org/wiki/Standard_ML">ML</A>,  
   152   <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>, etc. Python and other non-functional languages
   162   <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>, etc. Python and other non-functional languages
   153   can be also used, but seem much less convenient. If you attend my Formal Languages and
   163   can be also used, but seem much less convenient. If you attend my Compilers and Formal Languages
   154   Automata module, that would obviously give you a head-start with this project.
   164   module, that would obviously give you a head-start with this project.
   155   </p>
   165   </p>
   156   
   166   
   157 <li> <H4>[CU2] A Compiler for a small Programming Language</H4>
   167 <li> <H4>[CU2] A Compiler for a small Programming Language</H4>
   158 
   168 
   159   <p>
   169   <p>