msc-projects-12.html
author Christian Urban <urbanc@in.tum.de>
Tue, 06 Nov 2012 20:31:14 +0000 (2012-11-06)
changeset 158 04090808a981
parent 157 cd2423dbed5c
child 159 0b7860d00a4b
permissions -rw-r--r--
tuned
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HEAD>
<TITLE>2012/13 MSc Projects</TITLE>
<BASE HREF="http://www.inf.kcl.ac.uk/staff/urbanc/">
<script type="text/javascript" src="striper.js"></script>
<link rel="stylesheet" href="nominal.css">
</HEAD>
<BODY TEXT="#000000" 
      BGCOLOR="#4169E1" 
      LINK="#0000EF" 
      VLINK="#51188E" 
      ALINK="#FF0000"
      ONLOAD="striper('ul','striped','li','first,second')">



<TABLE WIDTH="100%" 
       BGCOLOR="#4169E1" 
       BORDER="0"   
       FRAME="border"  
       CELLPADDING="10"     
       CELLSPACING="2"
       RULES="all">

<TR>
<TD BGCOLOR="#FFFFFF" 
    WIDTH="75%" 
    VALIGN="TOP">

<H2>2012/13 MSc Projects</H2>
<H4>Supervisor: Christian Urban</H4> 
<H4>Email: christian dot urban at kcl dot ac dot uk,  Office: Strand Building S1.27</H4>
<H4>If you are interested in a project, please send me an email and we can discuss details. Please include
a short description about your programming skills and Computer Science background in your first email. 
I will also need your King's username in order to book the project for you. Thanks.</H4> 

<H4>Note that besides being a lecturer at the theoretical end of Computer Science, I am also a passionate
    <A HREF="http://en.wikipedia.org/wiki/Hacker_(programmer_subculture)">hacker</A> &hellip;
    defined as &ldquo;a person who enjoys exploring the details of programmable systems and 
    stretching their capabilities, as opposed to most users, who prefer to learn only the minimum 
    necessary.&rdquo; I am always happy to supervise like-minded students.</H4>  

<ul class="striped">
<li> <H4>[CU1] Regular Expression Matching and Partial Derivatives</H4>

  <p>
  <B>Description:</b>  
  <A HREF="http://en.wikipedia.org/wiki/Regular_expression">Regular expressions</A> 
  are extremely useful for many text-processing tasks...finding patterns in texts,
  lexing programs, syntax highlighting and so on. Given that regular expressions were
  introduced in 1950 by <A HREF="http://en.wikipedia.org/wiki/Stephen_Cole_Kleene">Stephen Kleene</A>, you might think 
  regular expressions have since been studied to death. But you would definitely be mistaken: in fact they are still
  an active research area. For example
  <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/ppdp12-part-deriv-sub-match.pdf">this paper</A> 
  about regular expression matching and partial derivatives was presented this summer at the international 
  PPDP'12 conference.</p>

  <p>The background for this project is that some regular expressions are 
  &quot;<A HREF="http://en.wikipedia.org/wiki/ReDoS#Examples">evil</A>&quot; 
  and can &quot;stab you in the back&quot; according to
  this recent <A HREF="http://tech.blog.cueup.com/regular-expressions-will-stab-you-in-the-back">blog post</A>.
  For example, if you use in <A HREF="http://www.python.org">Python</A> or 
  in <A HREF="http://www.ruby-lang.org/en/">Ruby</A> (probably also in other mainstream programming languages) the 
  innocently looking regular expression <code>a?{28}a{28}</code> and match it, say, against the string 
  <code>aaaaaaaaaaaaaaaaaaaaaaaaaaaa</code> (that is 28 <code>a</code>s), you will soon notice that your CPU usage goes to 100%. In fact,
  Python and Ruby need approximately 30 seconds for matching this string. You can try it for yourself:
  <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/re.py">re.py</A> (Python version) and 
  <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/re-internal.rb">re.rb</A> 
  (Ruby version). You can imagine an attacker
  mounting a nice <A HREF="http://en.wikipedia.org/wiki/Denial-of-service_attack">DoS attack</A> against 
  your program if it contains such an &quot;evil&quot; regular expression. Actually 
  <A HREF="http://www.scala-lang.org/">Scala</A> (and also Java) are almost immune from such
  attacks as they can deal with strings of up to 4,300 <code>a</code>s in less than a second. But if you scale
  the regular expression and string further to, say, 4,600 <code>a</code>s, you get a <code>StackOverflowError</code> 
  exception chrashing your program.
  </p>

  <p>
  On a rainy afternoon, I implemented 
  <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/re3.scala">this</A> 
  regular expression matcher in Scala. It is not as fast as the official one in Scala, but
  it can match up to 11,000 <code>a</code>s in less than 5 seconds  without raising any exception
  (remember Python and Ruby both need nearly 30 seconds to process 28(!) <code>a</code>s, and Scala's
  offical matcher maxes out at 4,600 <code>a</code>s). My matcher is approximately
  85 lines of code and based on the concept of 
  <A HREF="http://lambda-the-ultimate.org/node/2293">derivatives of regular experssions</A>.
  These derivatives were introduced in 1964 by <A HREF="http://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)">
  Janusz Brzozowski</A>, but according to this 
  <A HREF="http://www.cl.cam.ac.uk/~so294/documents/jfp09.pdf">paper</A> had been lost in the &quot;sands of time&quot;.
  The advantage of derivatives is that they side-step completely the usual 
  <A HREF="http://hackingoff.com/compilers/regular-expression-to-nfa-dfa">translations</A> of regular expressions
  into NFAs or DFAs, which can introduce the exponential behaviour exhibited by the regular
  expression matchers in Python and Ruby.
  </p>

  <p>
  Now the guys from the 
  <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/ppdp12-part-deriv-sub-match.pdf">PPDP'12-paper</A> mentioned 
  above claim they are even faster than me and can deal with even more features of regular expressions
  (for example subexpression matching, which my rainy-afternoon matcher lacks). I am sure they thought
  about the problem much longer than a single afternoon. The task 
  in this project is to find out how good they actually are by implementing the results from their paper. 
  Their approach is based on the concept of partial derivatives introduced in 1994 by
  <A HREF="http://reference.kfupm.edu.sa/content/p/a/partial_derivatives_of_regular_expressio_1319383.pdf">Valentin Antimirov</A>.
  I used them once myself in a <A HREF="http://www.inf.kcl.ac.uk/staff/urbanc/Publications/rexp.pdf">paper</A> 
  in order to prove the <A HREF="http://en.wikipedia.org/wiki/Myhill–Nerode_theorem">Myhill-Nerode theorem</A>.
  So I know they are worth their money. Still, it would be interesting to actually compare their results
  with my simple rainy-afternoon matcher and &quot;blow away&quot; the regular expression matchers in Python and Ruby (and possibly
  in Scala too).
  </p>

  <p>
  <B>Literature:</B> 
  The place to start with this project is obviously this
  <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/ppdp12-part-deriv-sub-match.pdf">paper</A>.
  Traditional methods for regular expression matching are explained
  in the wikipedia articles 
  <A HREF="http://en.wikipedia.org/wiki/DFA_minimization">here</A> and 
  <A HREF="http://en.wikipedia.org/wiki/Powerset_construction">here</A>.
  The authoritative <A HREF="http://infolab.stanford.edu/~ullman/ialc.html">book</A>
  on automata and regular expressions is by John Hopcroft and Jeffrey Ullmann (available in the library). 
  There is also an online course about this topic by Ullman at 
  <A HREF="https://www.coursera.org/course/automata">Coursera</A>, though IMHO not 
  done with love. 
  Finally, there are millions of other pointers about regular expression
  matching on the Net. Test cases for &quot;<A HREF="http://en.wikipedia.org/wiki/ReDoS#Examples">evil</A>&quot;
  regular expressions can be obtained from <A HREF="http://en.wikipedia.org/wiki/ReDoS#Examples">here</A>.
  </p>

  <p>
  <B>Skills:</B> 
  This is a project for a student with an interest in theory and some
  reasonable programming skills. The project can be easily implemented
  in languages like
  <A HREF="http://www.scala-lang.org/">Scala</A>,
  <A HREF="http://en.wikipedia.org/wiki/Standard_ML">ML</A>,  
  <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>, 
  <A HREF="http://www.python.org">Python</A>, etc.
  </p>

<!--
<li> <H4>[CU2] Equivalence Checking of Regular Expressions</H4>

  <p>
  <B>Description:</b>  
  Solving the problem of deciding the equivalence of regular expressions can be used
  to decide a number of problems in automated reasoning. Recently, 
  <A HREF="http://www.cs.unibo.it/~asperti/">Andreas Asperti</A>
  proposed a simple method for deciding regular expression equivalence described
  <A HREF="http://www.cs.unibo.it/~asperti/PAPERS/compact.pdf">here</A>. 
  The task is to implement this method and test it on examples.
  It would be also interesting to see whether Asperti's method applies to
  extended regular expressions, described
  <A HREF="http://ww2.cs.mu.oz.au/~sulzmann/manuscript/reg-exp-partial-derivatives.pdf">here</A>.
  </p>

  <p>
  <B>Literature:</B> 
  The central literature is obviously the papers
  <A HREF="http://www.cs.unibo.it/~asperti/PAPERS/compact.pdf">here</A> and
  <A HREF="http://ww2.cs.mu.oz.au/~sulzmann/manuscript/reg-exp-partial-derivatives.pdf">here</A>.
  Asperti has also some slides <A HREF="http://www.cs.unibo.it/~asperti/SLIDES/regular.pdf">here</a>.
  More references about regular expressions can be found
  <A HREF="http://en.wikipedia.org/wiki/Regular_expression">here</A>. Like in
  [CU1], I will give a lot of the background pf this project in
  my Automata and Formal Languages course (6CCS3AFL).
  </p>  

  <p>
  <B>Skills:</B> 
  This is a project for a student with a passion for theory and some
  reasonable programming skills. The project can be easily implemented
  in languages like Scala
  <A HREF="http://www.scala-lang.org/">Scala</A>,
  <A HREF="http://en.wikipedia.org/wiki/Standard_ML">ML</A>,  
  <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>, 
  <A HREF="http://www.python.org">Python</A>, etc.
  Being able to read <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>
  code is beneficial for the part involving extended regular expressions.
  </p>
-->

<li> <H4>[CU3] Machine Code Generation for a Simple Compiler</H4>

  <p>
  <b>Description:</b> 
  Compilers translate high-level programs that humans can read and write into
  efficient machine code that can be run on a CPU or virtual machine.
  I recently implemented a very simple compiler for a very simple functional
  programming language following this 
  <A HREF="http://www.cs.princeton.edu/~dpw/papers/tal-toplas.pdf">paper</A> 
  (also described <A HREF="http://www.cs.princeton.edu/~dpw/papers/tal-tr.pdf">here</A>).
  My code, written in <A HREF="http://www.scala-lang.org/">Scala</A>, of this compiler is 
  <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/compiler.scala">here</A>.
  The compiler can deal with simple programs involving natural numbers, such
  as Fibonacci numbers
  or factorial (but it can be easily extended - that is not the point).
  </p>

  <p>
  While the hard work has been done (understanding the two papers above),
  my compiler only produces some idealised machine code. For example I
  assume there are infinitely many registers. The goal of this
  project is to generate machine code that is more realistic and can
  run on a CPU, like x86, or run on a virtual machine, say the JVM. 
  This gives probably a speedup of thousand times in comparison to
  my naive machine code and virtual machine. The project
  requires to dig into the literature about real CPUs and generating 
  real machine code. 
  </p>

  <p>
  <B>Literature:</B>
  There is a lot of literature about compilers 
  (for example <A HREF="http://www.cs.princeton.edu/~appel/papers/cwc.html">this book</A> -
  I can lend you my copy for the duration of the project). A very good overview article
  about implementing compilers by 
  <A HREF="http://tratt.net/laurie/">Laurie Tratt</A> is 
  <A HREF="http://tratt.net/laurie/tech_articles/articles/how_difficult_is_it_to_write_a_compiler">here</A>.
  An introduction into x86 machine code is <A HREF="http://ianseyler.github.com/easy_x86-64/">here</A>.
  Intel's official manual for the x86 instruction is 
  <A HREF="http://download.intel.com/design/intarch/manuals/24319101.pdf">here</A>. 
  A simple assembler for the JVM is described <A HREF="http://jasmin.sourceforge.net">here</A>.
  An interesting twist of this project is to not generate code for a CPU, but
  for the intermediate language of the <A HREF="http://llvm.org">LLVM</A> compiler
  (also described <A HREF="https://wiki.aalto.fi/display/t1065450/LLVM+IR">here</A> and
  <A HREF="http://llvm.org/docs/LangRef.html">here</A>). If you want to see
  what machine code looks like you can compile your C-program using gcc -S.
  </p>

  <p>
  <B>Skills:</B> 
  This is a project for a student with a deep interest in programming languages and
  compilers. Since my compiler is implemented in <A HREF="http://www.scala-lang.org/">Scala</A>,
  it would make sense to continue this project in this language. I can be
  of help with questions and books about <A HREF="http://www.scala-lang.org/">Scala</A>.
  But if Scala is a problem, my code can also be translated quickly into any other functional
  language. 
  </p>

<li> <H4>[CU4] Implementation of Register Spilling Algorithms</H4>
  
  <p>
  <b>Description:</b> 
  This project is similar to [CU3]. The emphasis here, however, is on the
  implementation and comparison of register spilling algorithms, also often called register allocation 
  algorithms. They are part of any respectable compiler.  As said
  in [CU3], however, my simple compiler lacks them and assumes an infinite amount of registers instead.
  Real CPUs however only provide a fixed amount of registers (for example
  x86-64 has 16 general purpose registers). Whenever a program needs
  to hold more values than registers, the values need to be &ldquo;spilled&rdquo;
  into the main memory. Register spilling algorithms try to minimise
  this spilling, since fetching values from main memory is a costly 
  operation. 
  </p>

  <p>
  The classic algorithm for register spilling uses a
  <A HREF="http://en.wikipedia.org/wiki/Register_allocation">graph-colouring method</A>.
  However, for some time the <A HREF="http://llvm.org">LLVM</A> compiler
  used a supposedly more efficient method, called the linear scan allocation method
  (described 
  <A HREF="http://www.cs.ucla.edu/~palsberg/course/cs132/linearscan.pdf">here</A>).
  However, it was later decided to abandon this method in favour of 
  a <A HREF="http://blog.llvm.org/2011/09/greedy-register-allocation-in-llvm-30.html">
  greedy register allocation</A> method. It would be nice if this project can find out
  what the issues are with these methods and implement at least one of them for the 
  simple compiler referenced in [CU3].
  </p>

  <p>
  <B>Literature:</B> 
  The graph colouring method is described in Andrew Appel's 
  <A HREF="http://www.cs.princeton.edu/~appel/modern/java/">book</A> on compilers
  (I can give you my copy of this book, if it is not available in the library).
  There is also a survey 
  <A HREF="http://compilers.cs.ucla.edu/fernando/publications/drafts/survey.pdf">article</A> 
  about register allocation algorithms with further pointers.
  </p>

  <p>
  <B>Skills:</B> 
  Same skills as [CU3].
  </p>

<li> <H4>[CU5] A Student Polling System</H4>

  <p>
  <B>Description:</B>
  One of the more annoying aspects of giving a lecture is to ask a question
  to the students and no matter how easy the questions is to not 
  receive an answer. Recently, the online course system 
  <A HREF="http://www.udacity.com">Udacity</A> made an art out of
  asking questions during lectures (see for example the
  <A HREF="http://www.udacity.com/overview/Course/cs253/CourseRev/apr2012">Web Application Engineering</A> 
  course CS253).
  The lecturer there gives multiple-choice questions as part of the lecture and the students need to 
  click on the appropriate answer. This works very well in the online world. 
  For  &ldquo;real-world&rdquo; lectures, the department has some 
  <A HREF="http://en.wikipedia.org/wiki/Audience_response">clickers</A>
  (these are little devices part of an audience response systems). However, 
  they are a logistic nightmare for the lecturer: they need to be distributed 
  during the lecture and collected at the end. Nowadays, where students
  come with their own laptop or smartphone to lectures, this can
  be improved.
  </p>

  <p>
  The task of this project is to implement an online student
  polling system. The lecturer should be able to prepare 
  questions beforehand (encoded as some web-form) and be able to 
  show them during the lecture. The students
  can give their answers by clicking on the corresponding webpage.
  The lecturer can then collect the responses online and evaluate them 
  immediately. Such a system is sometimes called
  <A HREF="http://en.wikipedia.org/wiki/Audience_response#Smartphone_.2F_HTTP_voting">HTML voting</A>. 
  There are a number of commercial
  solutions for this problem, but they are not easy to use (in addition
  to being ridiculously expensive). A good student can easily improve upon
  what they provide. 
  </p>

  <p>
  The problem of student polling is not as hard as 
  <A HREF="http://en.wikipedia.org/wiki/Electronic_voting">electronic voting</A>, 
  which essentially is still an unsolved problem in Computer Science. The
  students only need to be prevented from answering question more than once thus skewing
  any statistics. Unlike electronic voting, no audit trail needs to be kept
  for student polling. Restricting the number of answers can probably be solved 
  by setting appropriate cookies on the students
  computers or smart phones.
  </p>

  <p>
  <B>Literature:</B> 
  The project requires fluency in a web-programming language (for example 
  <A HREF="http://en.wikipedia.org/wiki/JavaScript">Javascript</A>,
  <A HREF="http://en.wikipedia.org/wiki/PHP">PHP</A>, 
  Java, <A HREF="http://www.python.org">Python</A>, 
  <A HREF="http://en.wikipedia.org/wiki/Go_(programming_language)">Go</A>, 
  <A HREF="http://www.scala-lang.org/">Scala</A>,
  <A HREF="http://en.wikipedia.org/wiki/Ruby_(programming_language)">Ruby</A>) 
  and possibly a cloud application platform (for example
  <A HREF="https://developers.google.com/appengine/">Google App Engine</a> or 
  <A HREF="http://www.heroku.com">Heroku</A>).
  For web-programming the 
  <A HREF="http://www.udacity.com/overview/Course/cs253/CourseRev/apr2012">Web Application Engineering</A>
  course at <A HREF="http://www.udacity.com">Udacity</A> is a good starting point 
  to be aware of the issues involved. This course uses <A HREF="http://www.python.org">Python</A>.
  To evaluate the answers from the student, Google's 
  <A HREF="https://developers.google.com/chart/image/docs/making_charts">Chart Tools</A>
  might be useful, which ar also described in this 
  <A HREF="http://www.youtube.com/watch?v=NZtgT4jgnE8">youtube</A> video.
  </p>

  <p>
  <B>Skills:</B> 
  In order to provide convenience for the lecturer, this project needs very good web-programming skills. A 
  <A HREF="http://en.wikipedia.org/wiki/Hacker_(programmer_subculture)">hacker mentality</A>
  (see above) is probably very beneficial: web-programming is an area that only emerged recently and
  many tools still lack maturity. You probably have to experiment a lot with several different
  languages and tools.
  </p>

<li> <H4>[CU6] Implementation of a Distributed Clock-Synchronisation Algorithm developed at NASA</H4>
  
  <p>
  <B>Description:</B>
  There are many algorithms for synchronising clocks. This
  <A HREF="http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20120000054_2011025573.pdf">paper</A> 
  describes a new algorithm for clocks that communicate by exchanging
  messages and thereby reach a state in which (within some bound) all clocks are synchronised.
  A slightly longer and more detailed paper about the algorithm is 
  <A HREF="http://hdl.handle.net/2060/20110020812">here</A>.
  The point of this project is to implement this algorithm and simulate networks of clocks.
  </p>

  <p>
  <B>Literature:</B> 
  There is a wide range of literature on clock syncronisation algorithms. 
  Some pointers are given in this
  <A HREF="http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20120000054_2011025573.pdf">paper</A>,
  which describes the algorithm to be implemented in this project. Pointers
  are given also <A HREF="http://en.wikipedia.org/wiki/Clock_synchronization">here</A>.
  </p>

  <p>
  <B>Skills:</B> 
  In order to implement a simulation of a network of clocks, you need to tackle
  concurrency. You can do this for example in the programming language
  <A HREF="http://www.scala-lang.org/">Scala</A> with the help of the 
  <A HREF="http://akka.io">Akka</a> library. This library enables you to send messages
  between different <I>actors</I>. <A HREF="http://www.scala-lang.org/node/242">Here</A> 
  are some examples that explain how to implement exchanging messages between actors. 
  </p>

</ul>
</TD>
</TR>
</TABLE>

<P>
<!-- Created: Tue Mar  4 00:23:25 GMT 1997 -->
<!-- hhmts start -->
Last modified: Wed Sep 12 16:30:03 GMT 2012
<!-- hhmts end -->
<a href="http://validator.w3.org/check/referer">[Validate this page.]</a>
</BODY>
</HTML>