--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/msc-projects-12.html Tue Nov 06 00:04:58 2012 +0000
@@ -0,0 +1,408 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<HEAD>
+<TITLE>2012/13 MSc Projects</TITLE>
+<BASE HREF="http://www.inf.kcl.ac.uk/staff/urbanc/">
+<script type="text/javascript" src="striper.js"></script>
+<link rel="stylesheet" href="nominal.css">
+</HEAD>
+<BODY TEXT="#000000"
+ BGCOLOR="#4169E1"
+ LINK="#0000EF"
+ VLINK="#51188E"
+ ALINK="#FF0000"
+ ONLOAD="striper('ul','striped','li','first,second')">
+
+
+
+<TABLE WIDTH="100%"
+ BGCOLOR="#4169E1"
+ BORDER="0"
+ FRAME="border"
+ CELLPADDING="10"
+ CELLSPACING="2"
+ RULES="all">
+
+<TR>
+<TD BGCOLOR="#FFFFFF"
+ WIDTH="75%"
+ VALIGN="TOP">
+
+<H2>2012/13 MSc Projects</H2>
+<H4>Supervisor: Christian Urban</H4>
+<H4>Email: christian dot urban at kcl dot ac dot uk, Office: Strand Building S1.27</H4>
+<H4>If you are interested in a project, please send me an email and we can discuss details. Please include
+a short description about your programming skills and Computer Science background in your first email.
+I will also need your King's username in order to book the project for you. Thanks.</H4>
+
+<H4>Note that besides being a lecturer at the theoretical end of Computer Science, I am also a passionate
+ <A HREF="http://en.wikipedia.org/wiki/Hacker_(programmer_subculture)">hacker</A> …
+ defined as “a person who enjoys exploring the details of programmable systems and
+ stretching their capabilities, as opposed to most users, who prefer to learn only the minimum
+ necessary.” I am always happy to supervise like-minded students.</H4>
+
+<ul class="striped">
+<li> <H4>[CU1] Regular Expression Matching and Partial Derivatives</H4>
+
+ <p>
+ <B>Description:</b>
+ <A HREF="http://en.wikipedia.org/wiki/Regular_expression">Regular expressions</A>
+ are extremely useful for many text-processing tasks...finding patterns in texts,
+ lexing programs, syntax highlighting and so on. Given that regular expressions were
+ introduced in 1950 by <A HREF="http://en.wikipedia.org/wiki/Stephen_Cole_Kleene">Stephen Kleene</A>, you might think
+ regular expressions have since been studied to death. But you would definitely be mistaken: in fact they are still
+ an active research area. For example
+ <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/ppdp12-part-deriv-sub-match.pdf">this paper</A>
+ about regular expression matching and partial derivatives was presented this summer at the international
+ PPDP'12 conference.</p>
+
+ <p>The background for this project is that some regular expressions are
+ "<A HREF="http://en.wikipedia.org/wiki/ReDoS#Examples">evil</A>"
+ and can "stab you in the back" according to
+ this recent <A HREF="http://tech.blog.cueup.com/regular-expressions-will-stab-you-in-the-back">blog post</A>.
+ For example, if you use in <A HREF="http://www.python.org">Python</A> or
+ in <A HREF="http://www.ruby-lang.org/en/">Ruby</A> (probably also in other mainstream programming languages) the
+ innocently looking regular expression <code>a?{28}a{28}</code> and match it, say, against the string
+ <code>aaaaaaaaaaaaaaaaaaaaaaaaaaaa</code>, you will soon notice that your CPU usage goes to 100%. In fact,
+ Python and Ruby need approximately 30 seconds for matching this string. You can try it for yourself:
+ <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/re.py">re.py</A> (Python version) and
+ <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/re-internal.rb">re.rb</A>
+ (Ruby version). You can imagine an attacker
+ mounting a nice <A HREF="http://en.wikipedia.org/wiki/Denial-of-service_attack">DoS attack</A> against
+ your program if it contains such an "evil" regular expression. Actually
+ <A HREF="http://www.scala-lang.org/">Scala</A> (and also Java) are almost immune from such
+ attacks as they can deal with strings of up to 4,300 <code>a</code>s in less than a second. But if you scale
+ the regular expression and string further to, say, 4,600 <code>a</code>s, you get a <code>StackOverflowError</code>
+ exception chrashing your program.
+ </p>
+
+ <p>
+ On a rainy afternoon, I implemented
+ <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/re3.scala">this</A>
+ regular expression matcher in Scala. It is not as fast as the official one in Scala, but
+ it can match up to 11,000 <code>a</code>s in less than 5 seconds without raising any exception
+ (remember Python and Ruby both need nearly 30 seconds to process 28(!) <code>a</code>s, and Scala's
+ offical matcher maxes out at 4,600 <code>a</code>s). My matcher is approximately
+ 85 lines of code and based on the concept of
+ <A HREF="http://lambda-the-ultimate.org/node/2293">derivatives of regular experssions</A>.
+ Derivatives were introduced in 1964 by <A HREF="http://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)">
+ Janusz Brzozowski</A>, but according to this
+ <A HREF="http://www.cl.cam.ac.uk/~so294/documents/jfp09.pdf">paper</A> had been lost in the "sands of time".
+ The advantage of derivatives is that they side-step completely the usual
+ <A HREF="http://hackingoff.com/compilers/regular-expression-to-nfa-dfa">translations</A> of regular expressions
+ into NFAs or DFAs, which can introduce the exponential behaviour exhibited by the regular
+ expression matchers in Python and Ruby.
+ </p>
+
+ <p>
+ Now the guys from the
+ <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/ppdp12-part-deriv-sub-match.pdf">PPDP'12-paper</A> mentioned
+ above claim they are even faster than me and can deal with even more features of regular expressions
+ (for example subexpression matching, which my rainy-afternoon matcher lacks). I am sure they thought
+ about the problem much longer than a single afternoon. The task
+ in this project is to find out how good they actually are by implementing the results from their paper.
+ Their approach is based on the concept of partial derivatives introduced in 1994 by
+ <A HREF="http://reference.kfupm.edu.sa/content/p/a/partial_derivatives_of_regular_expressio_1319383.pdf">Valentin Antimirov</A>.
+ I used them <A HREF="http://www.inf.kcl.ac.uk/staff/urbanc/Publications/rexp.pdf">once</A>
+ in order to prove the <A HREF="http://en.wikipedia.org/wiki/Myhill–Nerode_theorem">Myhill-Nerode theorem</A>
+ by using only regular expressions.
+ </p>
+
+ <p>
+ <B>Literature:</B>
+ The place to start with this project is obviously this
+ <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/ppdp12-part-deriv-sub-match.pdf">paper</A>.
+ Traditional methods for regular expression matching are explained
+ in the wikipedia articles
+ <A HREF="http://en.wikipedia.org/wiki/DFA_minimization">here</A> and
+ <A HREF="http://en.wikipedia.org/wiki/Powerset_construction">here</A>.
+ The authoritative <A HREF="http://infolab.stanford.edu/~ullman/ialc.html">book</A>
+ on automata and regular expressions is by John Hopcroft and Jeffrey Ullmann (available in the library).
+ There is also an online course about this topic by Ullman at
+ <A HREF="https://www.coursera.org/course/automata">Coursera</A>, though IMHO not
+ done with love.
+ Finally, there are millions of other pointers about regular expression
+ matching on the Net. Test cases for "<A HREF="http://en.wikipedia.org/wiki/ReDoS#Examples">evil</A>"
+ regular expressions can be obtained from <A HREF="http://en.wikipedia.org/wiki/ReDoS#Examples">here</A>.
+ </p>
+
+ <p>
+ <B>Skills:</B>
+ This is a project for a student with an interest in theory and some
+ reasonable programming skills. The project can be easily implemented
+ in languages like
+ <A HREF="http://www.scala-lang.org/">Scala</A>,
+ <A HREF="http://en.wikipedia.org/wiki/Standard_ML">ML</A>,
+ <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>,
+ <A HREF="http://www.python.org">Python</A>, etc.
+ </p>
+
+<!--
+<li> <H4>[CU2] Equivalence Checking of Regular Expressions</H4>
+
+ <p>
+ <B>Description:</b>
+ Solving the problem of deciding the equivalence of regular expressions can be used
+ to decide a number of problems in automated reasoning. Recently,
+ <A HREF="http://www.cs.unibo.it/~asperti/">Andreas Asperti</A>
+ proposed a simple method for deciding regular expression equivalence described
+ <A HREF="http://www.cs.unibo.it/~asperti/PAPERS/compact.pdf">here</A>.
+ The task is to implement this method and test it on examples.
+ It would be also interesting to see whether Asperti's method applies to
+ extended regular expressions, described
+ <A HREF="http://ww2.cs.mu.oz.au/~sulzmann/manuscript/reg-exp-partial-derivatives.pdf">here</A>.
+ </p>
+
+ <p>
+ <B>Literature:</B>
+ The central literature is obviously the papers
+ <A HREF="http://www.cs.unibo.it/~asperti/PAPERS/compact.pdf">here</A> and
+ <A HREF="http://ww2.cs.mu.oz.au/~sulzmann/manuscript/reg-exp-partial-derivatives.pdf">here</A>.
+ Asperti has also some slides <A HREF="http://www.cs.unibo.it/~asperti/SLIDES/regular.pdf">here</a>.
+ More references about regular expressions can be found
+ <A HREF="http://en.wikipedia.org/wiki/Regular_expression">here</A>. Like in
+ [CU1], I will give a lot of the background pf this project in
+ my Automata and Formal Languages course (6CCS3AFL).
+ </p>
+
+ <p>
+ <B>Skills:</B>
+ This is a project for a student with a passion for theory and some
+ reasonable programming skills. The project can be easily implemented
+ in languages like Scala
+ <A HREF="http://www.scala-lang.org/">Scala</A>,
+ <A HREF="http://en.wikipedia.org/wiki/Standard_ML">ML</A>,
+ <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>,
+ <A HREF="http://www.python.org">Python</A>, etc.
+ Being able to read <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>
+ code is beneficial for the part involving extended regular expressions.
+ </p>
+-->
+
+<li> <H4>[CU3] Machine Code Generation for a Simple Compiler</H4>
+
+ <p>
+ <b>Description:</b>
+ Compilers translate high-level programs that humans can read and write into
+ efficient machine code that can be run on a CPU or virtual machine.
+ I recently implemented a very simple compiler for a very simple functional
+ programming language following this
+ <A HREF="http://www.cs.princeton.edu/~dpw/papers/tal-toplas.pdf">paper</A>
+ (also described <A HREF="http://www.cs.princeton.edu/~dpw/papers/tal-tr.pdf">here</A>).
+ My code, written in <A HREF="http://www.scala-lang.org/">Scala</A>, of this compiler is
+ <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/compiler.scala">here</A>.
+ The compiler can deal with simple programs involving natural numbers, such
+ as Fibonacci numbers
+ or factorial (but it can be easily extended - that is not the point).
+ </p>
+
+ <p>
+ While the hard work has been done (understanding the two papers above),
+ my compiler only produces some idealised machine code. For example I
+ assume there are infinitely many registers. The goal of this
+ project is to generate machine code that is more realistic and can
+ run on a CPU, like x86, or run on a virtual machine, say the JVM.
+ This gives probably a speedup of thousand times in comparison to
+ my naive machine code and virtual machine. The project
+ requires to dig into the literature about real CPUs and generating
+ real machine code.
+ </p>
+
+ <p>
+ <B>Literature:</B>
+ There is a lot of literature about compilers
+ (for example <A HREF="http://www.cs.princeton.edu/~appel/papers/cwc.html">this book</A> -
+ I can lend you my copy for the duration of the project). A very good overview article
+ about implementing compilers by
+ <A HREF="http://tratt.net/laurie/">Laurie Tratt</A> is
+ <A HREF="http://tratt.net/laurie/tech_articles/articles/how_difficult_is_it_to_write_a_compiler">here</A>.
+ An introduction into x86 machine code is <A HREF="http://ianseyler.github.com/easy_x86-64/">here</A>.
+ Intel's official manual for the x86 instruction is
+ <A HREF="http://download.intel.com/design/intarch/manuals/24319101.pdf">here</A>.
+ A simple assembler for the JVM is described <A HREF="http://jasmin.sourceforge.net">here</A>.
+ An interesting twist of this project is to not generate code for a CPU, but
+ for the intermediate language of the <A HREF="http://llvm.org">LLVM</A> compiler
+ (also described <A HREF="https://wiki.aalto.fi/display/t1065450/LLVM+IR">here</A> and
+ <A HREF="http://llvm.org/docs/LangRef.html">here</A>). If you want to see
+ what machine code looks like you can compile your C-program using gcc -S.
+ </p>
+
+ <p>
+ <B>Skills:</B>
+ This is a project for a student with a deep interest in programming languages and
+ compilers. Since my compiler is implemented in <A HREF="http://www.scala-lang.org/">Scala</A>,
+ it would make sense to continue this project in this language. I can be
+ of help with questions and books about <A HREF="http://www.scala-lang.org/">Scala</A>.
+ But if Scala is a problem, my code can also be translated quickly into any other functional
+ language.
+ </p>
+
+<li> <H4>[CU4] Implementation of Register Spilling Algorithms</H4>
+
+ <p>
+ <b>Description:</b>
+ This project is similar to [CU3]. The emphasis here, however, is on the
+ implementation and comparison of register spilling algorithms, also often called register allocation
+ algorithms. They are part of any respectable compiler. As said
+ in [CU3], however, my simple compiler lacks them and assumes an infinite amount of registers instead.
+ Real CPUs however only provide a fixed amount of registers (for example
+ x86-64 has 16 general purpose registers). Whenever a program needs
+ to hold more values than registers, the values need to be “spilled”
+ into the main memory. Register spilling algorithms try to minimise
+ this spilling, since fetching values from main memory is a costly
+ operation.
+ </p>
+
+ <p>
+ The classic algorithm for register spilling uses a
+ <A HREF="http://en.wikipedia.org/wiki/Register_allocation">graph-colouring method</A>.
+ However, for some time the <A HREF="http://llvm.org">LLVM</A> compiler
+ used a supposedly more efficient method, called the linear scan allocation method
+ (described
+ <A HREF="http://www.cs.ucla.edu/~palsberg/course/cs132/linearscan.pdf">here</A>).
+ However, it was later decided to abandon this method in favour of
+ a <A HREF="http://blog.llvm.org/2011/09/greedy-register-allocation-in-llvm-30.html">
+ greedy register allocation</A> method. It would be nice if this project can find out
+ what the issues are with these methods and implement at least one of them for the
+ simple compiler referenced in [CU3].
+ </p>
+
+ <p>
+ <B>Literature:</B>
+ The graph colouring method is described in Andrew Appel's
+ <A HREF="http://www.cs.princeton.edu/~appel/modern/java/">book</A> on compilers
+ (I can give you my copy of this book, if it is not available in the library).
+ There is also a survey
+ <A HREF="http://compilers.cs.ucla.edu/fernando/publications/drafts/survey.pdf">article</A>
+ about register allocation algorithms with further pointers.
+ </p>
+
+ <p>
+ <B>Skills:</B>
+ Same skills as [CU3].
+ </p>
+
+<li> <H4>[CU5] A Student Polling System</H4>
+
+ <p>
+ <B>Description:</B>
+ One of the more annoying aspects of giving a lecture is to ask a question
+ to the students and no matter how easy the questions is to not
+ receive an answer. Recently, the online course system
+ <A HREF="http://www.udacity.com">Udacity</A> made an art out of
+ asking questions during lectures (see for example the
+ <A HREF="http://www.udacity.com/overview/Course/cs253/CourseRev/apr2012">Web Application Engineering</A>
+ course CS253).
+ The lecturer there gives multiple-choice questions as part of the lecture and the students need to
+ click on the appropriate answer. This works very well in the online world.
+ For “real-world” lectures, the department has some
+ <A HREF="http://en.wikipedia.org/wiki/Audience_response">clickers</A>
+ (these are little devices part of an audience response systems). However,
+ they are a logistic nightmare for the lecturer: they need to be distributed
+ during the lecture and collected at the end. Nowadays, where students
+ come with their own laptop or smartphone to lectures, this can
+ be improved.
+ </p>
+
+ <p>
+ The task of this project is to implement an online student
+ polling system. The lecturer should be able to prepare
+ questions beforehand (encoded as some web-form) and be able to
+ show them during the lecture. The students
+ can give their answers by clicking on the corresponding webpage.
+ The lecturer can then collect the responses online and evaluate them
+ immediately. Such a system is sometimes called
+ <A HREF="http://en.wikipedia.org/wiki/Audience_response#Smartphone_.2F_HTTP_voting">HTML voting</A>.
+ There are a number of commercial
+ solutions for this problem, but they are not easy to use (in addition
+ to being ridiculously expensive). A good student can easily improve upon
+ what they provide.
+ </p>
+
+ <p>
+ The problem of student polling is not as hard as
+ <A HREF="http://en.wikipedia.org/wiki/Electronic_voting">electronic voting</A>,
+ which essentially is still an unsolved problem in Computer Science. The
+ students only need to be prevented from answering question more than once thus skewing
+ any statistics. Unlike electronic voting, no audit trail needs to be kept
+ for student polling. Restricting the number of answers can probably be solved
+ by setting appropriate cookies on the students
+ computers or smart phones.
+ </p>
+
+ <p>
+ <B>Literature:</B>
+ The project requires fluency in a web-programming language (for example
+ <A HREF="http://en.wikipedia.org/wiki/JavaScript">Javascript</A>,
+ <A HREF="http://en.wikipedia.org/wiki/PHP">PHP</A>,
+ Java, <A HREF="http://www.python.org">Python</A>,
+ <A HREF="http://en.wikipedia.org/wiki/Go_(programming_language)">Go</A>,
+ <A HREF="http://www.scala-lang.org/">Scala</A>,
+ <A HREF="http://en.wikipedia.org/wiki/Ruby_(programming_language)">Ruby</A>)
+ and possibly a cloud application platform (for example
+ <A HREF="https://developers.google.com/appengine/">Google App Engine</a> or
+ <A HREF="http://www.heroku.com">Heroku</A>).
+ For web-programming the
+ <A HREF="http://www.udacity.com/overview/Course/cs253/CourseRev/apr2012">Web Application Engineering</A>
+ course at <A HREF="http://www.udacity.com">Udacity</A> is a good starting point
+ to be aware of the issues involved. This course uses <A HREF="http://www.python.org">Python</A>.
+ To evaluate the answers from the student, Google's
+ <A HREF="https://developers.google.com/chart/image/docs/making_charts">Chart Tools</A>
+ might be useful, which ar also described in this
+ <A HREF="http://www.youtube.com/watch?v=NZtgT4jgnE8">youtube</A> video.
+ </p>
+
+ <p>
+ <B>Skills:</B>
+ In order to provide convenience for the lecturer, this project needs very good web-programming skills. A
+ <A HREF="http://en.wikipedia.org/wiki/Hacker_(programmer_subculture)">hacker mentality</A>
+ (see above) is probably very beneficial: web-programming is an area that only emerged recently and
+ many tools still lack maturity. You probably have to experiment a lot with several different
+ languages and tools.
+ </p>
+
+<li> <H4>[CU6] Implementation of a Distributed Clock-Synchronisation Algorithm developed at NASA</H4>
+
+ <p>
+ <B>Description:</B>
+ There are many algorithms for synchronising clocks. This
+ <A HREF="http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20120000054_2011025573.pdf">paper</A>
+ describes a new algorithm for clocks that communicate by exchanging
+ messages and thereby reach a state in which (within some bound) all clocks are synchronised.
+ A slightly longer and more detailed paper about the algorithm is
+ <A HREF="http://hdl.handle.net/2060/20110020812">here</A>.
+ The point of this project is to implement this algorithm and simulate networks of clocks.
+ </p>
+
+ <p>
+ <B>Literature:</B>
+ There is a wide range of literature on clock syncronisation algorithms.
+ Some pointers are given in this
+ <A HREF="http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20120000054_2011025573.pdf">paper</A>,
+ which describes the algorithm to be implemented in this project. Pointers
+ are given also <A HREF="http://en.wikipedia.org/wiki/Clock_synchronization">here</A>.
+ </p>
+
+ <p>
+ <B>Skills:</B>
+ In order to implement a simulation of a network of clocks, you need to tackle
+ concurrency. You can do this for example in the programming language
+ <A HREF="http://www.scala-lang.org/">Scala</A> with the help of the
+ <A HREF="http://akka.io">Akka</a> library. This library enables you to send messages
+ between different <I>actors</I>. <A HREF="http://www.scala-lang.org/node/242">Here</A>
+ are some examples that explain how to implement exchanging messages between actors.
+ </p>
+
+</ul>
+</TD>
+</TR>
+</TABLE>
+
+<P>
+<!-- Created: Tue Mar 4 00:23:25 GMT 1997 -->
+<!-- hhmts start -->
+Last modified: Wed Sep 12 16:30:03 GMT 2012
+<!-- hhmts end -->
+<a href="http://validator.w3.org/check/referer">[Validate this page.]</a>
+</BODY>
+</HTML>