added
authorChristian Urban <urbanc@in.tum.de>
Tue, 06 Nov 2012 00:04:58 +0000
changeset 154 a73de9a29bb5
parent 153 7acf8ff8cb0d
child 155 c33e45869209
added
msc-projects-12.html
publications.html
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/msc-projects-12.html	Tue Nov 06 00:04:58 2012 +0000
@@ -0,0 +1,408 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<HEAD>
+<TITLE>2012/13 MSc Projects</TITLE>
+<BASE HREF="http://www.inf.kcl.ac.uk/staff/urbanc/">
+<script type="text/javascript" src="striper.js"></script>
+<link rel="stylesheet" href="nominal.css">
+</HEAD>
+<BODY TEXT="#000000" 
+      BGCOLOR="#4169E1" 
+      LINK="#0000EF" 
+      VLINK="#51188E" 
+      ALINK="#FF0000"
+      ONLOAD="striper('ul','striped','li','first,second')">
+
+
+
+<TABLE WIDTH="100%" 
+       BGCOLOR="#4169E1" 
+       BORDER="0"   
+       FRAME="border"  
+       CELLPADDING="10"     
+       CELLSPACING="2"
+       RULES="all">
+
+<TR>
+<TD BGCOLOR="#FFFFFF" 
+    WIDTH="75%" 
+    VALIGN="TOP">
+
+<H2>2012/13 MSc Projects</H2>
+<H4>Supervisor: Christian Urban</H4> 
+<H4>Email: christian dot urban at kcl dot ac dot uk,  Office: Strand Building S1.27</H4>
+<H4>If you are interested in a project, please send me an email and we can discuss details. Please include
+a short description about your programming skills and Computer Science background in your first email. 
+I will also need your King's username in order to book the project for you. Thanks.</H4> 
+
+<H4>Note that besides being a lecturer at the theoretical end of Computer Science, I am also a passionate
+    <A HREF="http://en.wikipedia.org/wiki/Hacker_(programmer_subculture)">hacker</A> &hellip;
+    defined as &ldquo;a person who enjoys exploring the details of programmable systems and 
+    stretching their capabilities, as opposed to most users, who prefer to learn only the minimum 
+    necessary.&rdquo; I am always happy to supervise like-minded students.</H4>  
+
+<ul class="striped">
+<li> <H4>[CU1] Regular Expression Matching and Partial Derivatives</H4>
+
+  <p>
+  <B>Description:</b>  
+  <A HREF="http://en.wikipedia.org/wiki/Regular_expression">Regular expressions</A> 
+  are extremely useful for many text-processing tasks...finding patterns in texts,
+  lexing programs, syntax highlighting and so on. Given that regular expressions were
+  introduced in 1950 by <A HREF="http://en.wikipedia.org/wiki/Stephen_Cole_Kleene">Stephen Kleene</A>, you might think 
+  regular expressions have since been studied to death. But you would definitely be mistaken: in fact they are still
+  an active research area. For example
+  <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/ppdp12-part-deriv-sub-match.pdf">this paper</A> 
+  about regular expression matching and partial derivatives was presented this summer at the international 
+  PPDP'12 conference.</p>
+
+  <p>The background for this project is that some regular expressions are 
+  &quot;<A HREF="http://en.wikipedia.org/wiki/ReDoS#Examples">evil</A>&quot; 
+  and can &quot;stab you in the back&quot; according to
+  this recent <A HREF="http://tech.blog.cueup.com/regular-expressions-will-stab-you-in-the-back">blog post</A>.
+  For example, if you use in <A HREF="http://www.python.org">Python</A> or 
+  in <A HREF="http://www.ruby-lang.org/en/">Ruby</A> (probably also in other mainstream programming languages) the 
+  innocently looking regular expression <code>a?{28}a{28}</code> and match it, say, against the string 
+  <code>aaaaaaaaaaaaaaaaaaaaaaaaaaaa</code>, you will soon notice that your CPU usage goes to 100%. In fact,
+  Python and Ruby need approximately 30 seconds for matching this string. You can try it for yourself:
+  <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/re.py">re.py</A> (Python version) and 
+  <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/re-internal.rb">re.rb</A> 
+  (Ruby version). You can imagine an attacker
+  mounting a nice <A HREF="http://en.wikipedia.org/wiki/Denial-of-service_attack">DoS attack</A> against 
+  your program if it contains such an &quot;evil&quot; regular expression. Actually 
+  <A HREF="http://www.scala-lang.org/">Scala</A> (and also Java) are almost immune from such
+  attacks as they can deal with strings of up to 4,300 <code>a</code>s in less than a second. But if you scale
+  the regular expression and string further to, say, 4,600 <code>a</code>s, you get a <code>StackOverflowError</code> 
+  exception chrashing your program.
+  </p>
+
+  <p>
+  On a rainy afternoon, I implemented 
+  <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/re3.scala">this</A> 
+  regular expression matcher in Scala. It is not as fast as the official one in Scala, but
+  it can match up to 11,000 <code>a</code>s in less than 5 seconds  without raising any exception
+  (remember Python and Ruby both need nearly 30 seconds to process 28(!) <code>a</code>s, and Scala's
+  offical matcher maxes out at 4,600 <code>a</code>s). My matcher is approximately
+  85 lines of code and based on the concept of 
+  <A HREF="http://lambda-the-ultimate.org/node/2293">derivatives of regular experssions</A>.
+ Derivatives were introduced in 1964 by <A HREF="http://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)">
+  Janusz Brzozowski</A>, but according to this 
+  <A HREF="http://www.cl.cam.ac.uk/~so294/documents/jfp09.pdf">paper</A> had been lost in the &quot;sands of time&quot;.
+  The advantage of derivatives is that they side-step completely the usual 
+  <A HREF="http://hackingoff.com/compilers/regular-expression-to-nfa-dfa">translations</A> of regular expressions
+  into NFAs or DFAs, which can introduce the exponential behaviour exhibited by the regular
+  expression matchers in Python and Ruby.
+  </p>
+
+  <p>
+  Now the guys from the 
+  <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/ppdp12-part-deriv-sub-match.pdf">PPDP'12-paper</A> mentioned 
+  above claim they are even faster than me and can deal with even more features of regular expressions
+  (for example subexpression matching, which my rainy-afternoon matcher lacks). I am sure they thought
+  about the problem much longer than a single afternoon. The task 
+  in this project is to find out how good they actually are by implementing the results from their paper. 
+  Their approach is based on the concept of partial derivatives introduced in 1994 by
+  <A HREF="http://reference.kfupm.edu.sa/content/p/a/partial_derivatives_of_regular_expressio_1319383.pdf">Valentin Antimirov</A>.
+  I used them <A HREF="http://www.inf.kcl.ac.uk/staff/urbanc/Publications/rexp.pdf">once</A> 
+  in order to prove the <A HREF="http://en.wikipedia.org/wiki/Myhill–Nerode_theorem">Myhill-Nerode theorem</A>
+  by using only regular expressions.
+  </p>
+
+  <p>
+  <B>Literature:</B> 
+  The place to start with this project is obviously this
+  <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/ppdp12-part-deriv-sub-match.pdf">paper</A>.
+  Traditional methods for regular expression matching are explained
+  in the wikipedia articles 
+  <A HREF="http://en.wikipedia.org/wiki/DFA_minimization">here</A> and 
+  <A HREF="http://en.wikipedia.org/wiki/Powerset_construction">here</A>.
+  The authoritative <A HREF="http://infolab.stanford.edu/~ullman/ialc.html">book</A>
+  on automata and regular expressions is by John Hopcroft and Jeffrey Ullmann (available in the library). 
+  There is also an online course about this topic by Ullman at 
+  <A HREF="https://www.coursera.org/course/automata">Coursera</A>, though IMHO not 
+  done with love. 
+  Finally, there are millions of other pointers about regular expression
+  matching on the Net. Test cases for &quot;<A HREF="http://en.wikipedia.org/wiki/ReDoS#Examples">evil</A>&quot;
+  regular expressions can be obtained from <A HREF="http://en.wikipedia.org/wiki/ReDoS#Examples">here</A>.
+  </p>
+
+  <p>
+  <B>Skills:</B> 
+  This is a project for a student with an interest in theory and some
+  reasonable programming skills. The project can be easily implemented
+  in languages like
+  <A HREF="http://www.scala-lang.org/">Scala</A>,
+  <A HREF="http://en.wikipedia.org/wiki/Standard_ML">ML</A>,  
+  <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>, 
+  <A HREF="http://www.python.org">Python</A>, etc.
+  </p>
+
+<!--
+<li> <H4>[CU2] Equivalence Checking of Regular Expressions</H4>
+
+  <p>
+  <B>Description:</b>  
+  Solving the problem of deciding the equivalence of regular expressions can be used
+  to decide a number of problems in automated reasoning. Recently, 
+  <A HREF="http://www.cs.unibo.it/~asperti/">Andreas Asperti</A>
+  proposed a simple method for deciding regular expression equivalence described
+  <A HREF="http://www.cs.unibo.it/~asperti/PAPERS/compact.pdf">here</A>. 
+  The task is to implement this method and test it on examples.
+  It would be also interesting to see whether Asperti's method applies to
+  extended regular expressions, described
+  <A HREF="http://ww2.cs.mu.oz.au/~sulzmann/manuscript/reg-exp-partial-derivatives.pdf">here</A>.
+  </p>
+
+  <p>
+  <B>Literature:</B> 
+  The central literature is obviously the papers
+  <A HREF="http://www.cs.unibo.it/~asperti/PAPERS/compact.pdf">here</A> and
+  <A HREF="http://ww2.cs.mu.oz.au/~sulzmann/manuscript/reg-exp-partial-derivatives.pdf">here</A>.
+  Asperti has also some slides <A HREF="http://www.cs.unibo.it/~asperti/SLIDES/regular.pdf">here</a>.
+  More references about regular expressions can be found
+  <A HREF="http://en.wikipedia.org/wiki/Regular_expression">here</A>. Like in
+  [CU1], I will give a lot of the background pf this project in
+  my Automata and Formal Languages course (6CCS3AFL).
+  </p>  
+
+  <p>
+  <B>Skills:</B> 
+  This is a project for a student with a passion for theory and some
+  reasonable programming skills. The project can be easily implemented
+  in languages like Scala
+  <A HREF="http://www.scala-lang.org/">Scala</A>,
+  <A HREF="http://en.wikipedia.org/wiki/Standard_ML">ML</A>,  
+  <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>, 
+  <A HREF="http://www.python.org">Python</A>, etc.
+  Being able to read <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>
+  code is beneficial for the part involving extended regular expressions.
+  </p>
+-->
+
+<li> <H4>[CU3] Machine Code Generation for a Simple Compiler</H4>
+
+  <p>
+  <b>Description:</b> 
+  Compilers translate high-level programs that humans can read and write into
+  efficient machine code that can be run on a CPU or virtual machine.
+  I recently implemented a very simple compiler for a very simple functional
+  programming language following this 
+  <A HREF="http://www.cs.princeton.edu/~dpw/papers/tal-toplas.pdf">paper</A> 
+  (also described <A HREF="http://www.cs.princeton.edu/~dpw/papers/tal-tr.pdf">here</A>).
+  My code, written in <A HREF="http://www.scala-lang.org/">Scala</A>, of this compiler is 
+  <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/compiler.scala">here</A>.
+  The compiler can deal with simple programs involving natural numbers, such
+  as Fibonacci numbers
+  or factorial (but it can be easily extended - that is not the point).
+  </p>
+
+  <p>
+  While the hard work has been done (understanding the two papers above),
+  my compiler only produces some idealised machine code. For example I
+  assume there are infinitely many registers. The goal of this
+  project is to generate machine code that is more realistic and can
+  run on a CPU, like x86, or run on a virtual machine, say the JVM. 
+  This gives probably a speedup of thousand times in comparison to
+  my naive machine code and virtual machine. The project
+  requires to dig into the literature about real CPUs and generating 
+  real machine code. 
+  </p>
+
+  <p>
+  <B>Literature:</B>
+  There is a lot of literature about compilers 
+  (for example <A HREF="http://www.cs.princeton.edu/~appel/papers/cwc.html">this book</A> -
+  I can lend you my copy for the duration of the project). A very good overview article
+  about implementing compilers by 
+  <A HREF="http://tratt.net/laurie/">Laurie Tratt</A> is 
+  <A HREF="http://tratt.net/laurie/tech_articles/articles/how_difficult_is_it_to_write_a_compiler">here</A>.
+  An introduction into x86 machine code is <A HREF="http://ianseyler.github.com/easy_x86-64/">here</A>.
+  Intel's official manual for the x86 instruction is 
+  <A HREF="http://download.intel.com/design/intarch/manuals/24319101.pdf">here</A>. 
+  A simple assembler for the JVM is described <A HREF="http://jasmin.sourceforge.net">here</A>.
+  An interesting twist of this project is to not generate code for a CPU, but
+  for the intermediate language of the <A HREF="http://llvm.org">LLVM</A> compiler
+  (also described <A HREF="https://wiki.aalto.fi/display/t1065450/LLVM+IR">here</A> and
+  <A HREF="http://llvm.org/docs/LangRef.html">here</A>). If you want to see
+  what machine code looks like you can compile your C-program using gcc -S.
+  </p>
+
+  <p>
+  <B>Skills:</B> 
+  This is a project for a student with a deep interest in programming languages and
+  compilers. Since my compiler is implemented in <A HREF="http://www.scala-lang.org/">Scala</A>,
+  it would make sense to continue this project in this language. I can be
+  of help with questions and books about <A HREF="http://www.scala-lang.org/">Scala</A>.
+  But if Scala is a problem, my code can also be translated quickly into any other functional
+  language. 
+  </p>
+
+<li> <H4>[CU4] Implementation of Register Spilling Algorithms</H4>
+  
+  <p>
+  <b>Description:</b> 
+  This project is similar to [CU3]. The emphasis here, however, is on the
+  implementation and comparison of register spilling algorithms, also often called register allocation 
+  algorithms. They are part of any respectable compiler.  As said
+  in [CU3], however, my simple compiler lacks them and assumes an infinite amount of registers instead.
+  Real CPUs however only provide a fixed amount of registers (for example
+  x86-64 has 16 general purpose registers). Whenever a program needs
+  to hold more values than registers, the values need to be &ldquo;spilled&rdquo;
+  into the main memory. Register spilling algorithms try to minimise
+  this spilling, since fetching values from main memory is a costly 
+  operation. 
+  </p>
+
+  <p>
+  The classic algorithm for register spilling uses a
+  <A HREF="http://en.wikipedia.org/wiki/Register_allocation">graph-colouring method</A>.
+  However, for some time the <A HREF="http://llvm.org">LLVM</A> compiler
+  used a supposedly more efficient method, called the linear scan allocation method
+  (described 
+  <A HREF="http://www.cs.ucla.edu/~palsberg/course/cs132/linearscan.pdf">here</A>).
+  However, it was later decided to abandon this method in favour of 
+  a <A HREF="http://blog.llvm.org/2011/09/greedy-register-allocation-in-llvm-30.html">
+  greedy register allocation</A> method. It would be nice if this project can find out
+  what the issues are with these methods and implement at least one of them for the 
+  simple compiler referenced in [CU3].
+  </p>
+
+  <p>
+  <B>Literature:</B> 
+  The graph colouring method is described in Andrew Appel's 
+  <A HREF="http://www.cs.princeton.edu/~appel/modern/java/">book</A> on compilers
+  (I can give you my copy of this book, if it is not available in the library).
+  There is also a survey 
+  <A HREF="http://compilers.cs.ucla.edu/fernando/publications/drafts/survey.pdf">article</A> 
+  about register allocation algorithms with further pointers.
+  </p>
+
+  <p>
+  <B>Skills:</B> 
+  Same skills as [CU3].
+  </p>
+
+<li> <H4>[CU5] A Student Polling System</H4>
+
+  <p>
+  <B>Description:</B>
+  One of the more annoying aspects of giving a lecture is to ask a question
+  to the students and no matter how easy the questions is to not 
+  receive an answer. Recently, the online course system 
+  <A HREF="http://www.udacity.com">Udacity</A> made an art out of
+  asking questions during lectures (see for example the
+  <A HREF="http://www.udacity.com/overview/Course/cs253/CourseRev/apr2012">Web Application Engineering</A> 
+  course CS253).
+  The lecturer there gives multiple-choice questions as part of the lecture and the students need to 
+  click on the appropriate answer. This works very well in the online world. 
+  For  &ldquo;real-world&rdquo; lectures, the department has some 
+  <A HREF="http://en.wikipedia.org/wiki/Audience_response">clickers</A>
+  (these are little devices part of an audience response systems). However, 
+  they are a logistic nightmare for the lecturer: they need to be distributed 
+  during the lecture and collected at the end. Nowadays, where students
+  come with their own laptop or smartphone to lectures, this can
+  be improved.
+  </p>
+
+  <p>
+  The task of this project is to implement an online student
+  polling system. The lecturer should be able to prepare 
+  questions beforehand (encoded as some web-form) and be able to 
+  show them during the lecture. The students
+  can give their answers by clicking on the corresponding webpage.
+  The lecturer can then collect the responses online and evaluate them 
+  immediately. Such a system is sometimes called
+  <A HREF="http://en.wikipedia.org/wiki/Audience_response#Smartphone_.2F_HTTP_voting">HTML voting</A>. 
+  There are a number of commercial
+  solutions for this problem, but they are not easy to use (in addition
+  to being ridiculously expensive). A good student can easily improve upon
+  what they provide. 
+  </p>
+
+  <p>
+  The problem of student polling is not as hard as 
+  <A HREF="http://en.wikipedia.org/wiki/Electronic_voting">electronic voting</A>, 
+  which essentially is still an unsolved problem in Computer Science. The
+  students only need to be prevented from answering question more than once thus skewing
+  any statistics. Unlike electronic voting, no audit trail needs to be kept
+  for student polling. Restricting the number of answers can probably be solved 
+  by setting appropriate cookies on the students
+  computers or smart phones.
+  </p>
+
+  <p>
+  <B>Literature:</B> 
+  The project requires fluency in a web-programming language (for example 
+  <A HREF="http://en.wikipedia.org/wiki/JavaScript">Javascript</A>,
+  <A HREF="http://en.wikipedia.org/wiki/PHP">PHP</A>, 
+  Java, <A HREF="http://www.python.org">Python</A>, 
+  <A HREF="http://en.wikipedia.org/wiki/Go_(programming_language)">Go</A>, 
+  <A HREF="http://www.scala-lang.org/">Scala</A>,
+  <A HREF="http://en.wikipedia.org/wiki/Ruby_(programming_language)">Ruby</A>) 
+  and possibly a cloud application platform (for example
+  <A HREF="https://developers.google.com/appengine/">Google App Engine</a> or 
+  <A HREF="http://www.heroku.com">Heroku</A>).
+  For web-programming the 
+  <A HREF="http://www.udacity.com/overview/Course/cs253/CourseRev/apr2012">Web Application Engineering</A>
+  course at <A HREF="http://www.udacity.com">Udacity</A> is a good starting point 
+  to be aware of the issues involved. This course uses <A HREF="http://www.python.org">Python</A>.
+  To evaluate the answers from the student, Google's 
+  <A HREF="https://developers.google.com/chart/image/docs/making_charts">Chart Tools</A>
+  might be useful, which ar also described in this 
+  <A HREF="http://www.youtube.com/watch?v=NZtgT4jgnE8">youtube</A> video.
+  </p>
+
+  <p>
+  <B>Skills:</B> 
+  In order to provide convenience for the lecturer, this project needs very good web-programming skills. A 
+  <A HREF="http://en.wikipedia.org/wiki/Hacker_(programmer_subculture)">hacker mentality</A>
+  (see above) is probably very beneficial: web-programming is an area that only emerged recently and
+  many tools still lack maturity. You probably have to experiment a lot with several different
+  languages and tools.
+  </p>
+
+<li> <H4>[CU6] Implementation of a Distributed Clock-Synchronisation Algorithm developed at NASA</H4>
+  
+  <p>
+  <B>Description:</B>
+  There are many algorithms for synchronising clocks. This
+  <A HREF="http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20120000054_2011025573.pdf">paper</A> 
+  describes a new algorithm for clocks that communicate by exchanging
+  messages and thereby reach a state in which (within some bound) all clocks are synchronised.
+  A slightly longer and more detailed paper about the algorithm is 
+  <A HREF="http://hdl.handle.net/2060/20110020812">here</A>.
+  The point of this project is to implement this algorithm and simulate networks of clocks.
+  </p>
+
+  <p>
+  <B>Literature:</B> 
+  There is a wide range of literature on clock syncronisation algorithms. 
+  Some pointers are given in this
+  <A HREF="http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20120000054_2011025573.pdf">paper</A>,
+  which describes the algorithm to be implemented in this project. Pointers
+  are given also <A HREF="http://en.wikipedia.org/wiki/Clock_synchronization">here</A>.
+  </p>
+
+  <p>
+  <B>Skills:</B> 
+  In order to implement a simulation of a network of clocks, you need to tackle
+  concurrency. You can do this for example in the programming language
+  <A HREF="http://www.scala-lang.org/">Scala</A> with the help of the 
+  <A HREF="http://akka.io">Akka</a> library. This library enables you to send messages
+  between different <I>actors</I>. <A HREF="http://www.scala-lang.org/node/242">Here</A> 
+  are some examples that explain how to implement exchanging messages between actors. 
+  </p>
+
+</ul>
+</TD>
+</TR>
+</TABLE>
+
+<P>
+<!-- Created: Tue Mar  4 00:23:25 GMT 1997 -->
+<!-- hhmts start -->
+Last modified: Wed Sep 12 16:30:03 GMT 2012
+<!-- hhmts end -->
+<a href="http://validator.w3.org/check/referer">[Validate this page.]</a>
+</BODY>
+</HTML>
--- a/publications.html	Sun Nov 04 08:45:04 2012 +0000
+++ b/publications.html	Tue Nov 06 00:04:58 2012 +0000
@@ -279,7 +279,7 @@
 <p><B>Nominal Logic Programming.</B> (with Cheney)
       In <A HREF="http://dl.acm.org/citation.cfm?id=1387675">
       ACM Transactions on Programming Languages and Systems</A>, 
-      2008, Vol. 30(5), pages 26:1-26:47.
+      2008, Vol. 30(5), Pages 26:1-26:47.
 </TD>
 </TR>