+-  
[CU1] Regular Expression Matching and Partial Derivatives
+
+  
+  Description:  
+  Regular expressions 
+  are extremely useful for many text-processing tasks such as finding patterns in texts,
+  lexing programs, syntax highlighting and so on. Given that regular expressions were
+  introduced in 1950 by Stephen Kleene,
+  you might think regular expressions have since been studied and implemented to death. But you would definitely be
+  mistaken: in fact they are still an active research area. For example
+  this paper 
+  about regular expression matching and partial derivatives was presented last summer at the international 
+  PPDP'12 conference. They even work on a followup paper that has not yet been presented at any
+  conference. The task in this project is to implement their results.
+
+  The background for this project is that some regular expressions are 
+  “evil”
+  and can “stab you in the back” according to
+  this blog post.
+  For example, if you use in Python or 
+  in Ruby (probably also in other mainstream programming languages) the 
+  innocently looking regular expression a?{28}a{28} and match it, say, against the string 
+  aaaaaaaaaaaaaaaaaaaaaaaaaaaa (that is 28 as), you will soon notice that your CPU usage goes to 100%. In fact,
+  Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself:
+  re.py (Python version) and 
+  re.rb 
+  (Ruby version). You can imagine an attacker
+  mounting a nice DoS attack against 
+  your program if it contains such an “evil” regular expression. Actually 
+  Scala (and also Java) are almost immune from such
+  attacks as they can deal with strings of up to 4,300 as in less than a second. But if you scale
+  the regular expression and string further to, say, 4,600 as, then you get a StackOverflowError 
+  potentially crashing your program.
+  
+
+  
+  On a rainy afternoon, I implemented 
+  this 
+  regular expression matcher in Scala. It is not as fast as the official one in Scala, but
+  it can match up to 11,000 as in less than 5 seconds  without raising any exception
+  (remember Python and Ruby both need nearly 30 seconds to process 28(!) as, and Scala's
+  official matcher maxes out at 4,600 as). My matcher is approximately
+  85 lines of code and based on the concept of 
+  derivatives of regular expressions.
+  These derivatives were introduced in 1964 by 
+  Janusz Brzozowski, but according to this 
+  paper had been lost in the “sands of time”.
+  The advantage of derivatives is that they side-step completely the usual 
+  translations of regular expressions
+  into NFAs or DFAs, which can introduce the exponential behaviour exhibited by the regular
+  expression matchers in Python and Ruby.
+  
+
+  
+  Now the authors from the 
+  PPDP'12-paper mentioned 
+  above claim they are even faster than me and can deal with even more features of regular expressions
+  (for example subexpression matching, which my rainy-afternoon matcher cannot). I am sure they thought
+  about the problem much longer than a single afternoon. The task 
+  in this project is to find out how good they actually are by implementing the results from their paper. 
+  Their approach is based on the concept of partial derivatives introduced in 1994 by
+  Valentin Antimirov.
+  I used them once myself in a paper 
+  in order to prove the Myhill-Nerode theorem.
+  So I know they are worth their money. Still, it would be interesting to actually compare their results
+  with my simple rainy-afternoon matcher and potentially “blow away” the regular expression matchers 
+  in Python and Ruby (and possibly in Scala too).
+  
+
+  
+  Literature: 
+  The place to start with this project is obviously this
+  paper.
+  Traditional methods for regular expression matching are explained
+  in the Wikipedia articles 
+  here and 
+  here.
+  The authoritative book
+  on automata and regular expressions is by John Hopcroft and Jeffrey Ullmann (available in the library). 
+  There is also an online course about this topic by Ullman at 
+  Coursera, though IMHO not 
+  done with love. 
+  Finally, there are millions of other pointers about regular expression
+  matching on the Web. I found the chapter on Lexing in this
+  online book very helpful.
+  Test cases for “evil”
+  regular expressions can be obtained from here.
+  
+
+  
+  Skills: 
+  This is a project for a student with an interest in theory and some
+  reasonable programming skills. The project can be easily implemented
+  in functional languages like
+  Scala,
+  F#, 
+  ML,  
+  Haskell, etc. Python and other non-functional languages
+  can be also used, but seem 
+  
+  
+ -  
[CU2] Machine Code Generation for a Simple Compiler
+
+  
+  Description: 
+  Compilers translate high-level programs that humans can read and write into
+  efficient machine code that can be run on a CPU or virtual machine.
+  A compiler for a simple functional language generating X86 code is described
+  here.
+  I recently implemented a very simple compiler for an even simpler functional
+  programming language following this 
+  paper 
+  (also described here).
+  My code, written in Scala, of this compiler is 
+  here.
+  The compiler can deal with simple programs involving natural numbers, such
+  as Fibonacci numbers or factorial (but it can be easily extended - that is not the point).
+  
+
+  
+  While the hard work has been done (understanding the two papers above),
+  my compiler only produces some idealised machine code. For example I
+  assume there are infinitely many registers. The goal of this
+  project is to generate machine code that is more realistic and can
+  run on a CPU, like X86, or run on a virtual machine, say the JVM. 
+  This gives probably a speedup of thousand times in comparison to
+  my naive machine code and virtual machine. The project
+  requires to dig into the literature about real CPUs and generating 
+  real machine code. 
+  
+
+  
+  Literature:
+  There is a lot of literature about compilers 
+  (for example this book -
+  I can lend you my copy for the duration of the project, or this
+  online book). A very good overview article
+  about implementing compilers by 
+  Laurie Tratt is 
+  here.
+  An online book about the Art of Assembly Language is
+  here.
+  An introduction into x86 machine code is here.
+  Intel's official manual for the x86 instruction is 
+  here. 
+  A simple assembler for the JVM is described here.
+  An interesting twist of this project is to not generate code for a CPU, but
+  for the intermediate language of the LLVM compiler
+  (also described here and
+  here). If you want to see
+  what machine code looks like you can compile your C-program using gcc -S.
+  
+
+  
+  Skills: 
+  This is a project for a student with a deep interest in programming languages and
+  compilers. Since my compiler is implemented in Scala,
+  it would make sense to continue this project in this language. I can be
+  of help with questions and books about Scala.
+  But if Scala is a problem, my code can also be translated quickly into any other functional
+  language. 
+  
+
+ -  
[CU3] Language Translator into JavaScript
+
+  
+  Description: 
+  JavaScript is a language that is supported by most
+  browsers and therefore is a favourite
+  vehicle for Web-programming. Some call it the scripting language of the Web.
+  Unfortunately, JavaScript is probably one of the worst
+  languages to program in (being designed and released in a hurry). But it can be used as a convenient target
+  for translating programs from other languages. In particular there are two
+  very optimised subsets of JavaScript that can be used for this purpose:
+  one is asm.js and the other is
+  emscripten.
+  There is a tutorial for emscripten
+  and an impressive demo which runs the
+  Unreal Engine 3
+  in a browser with spectacular speed. This was achieved by compiling the
+  C-code of the Unreal Engine to the LLVM intermediate language and then translating the LLVM
+  code to JavaScript.
+  
+
+  
+  Skills: 
+  This project is about exploring these two subsets of JavaScript and implement a translator
+  of a small language into them.  This is similar to the project [CU2] above and requires
+  similar skills. In addition it would be good to have already some familiarity with JavaScript.
+  There are plenty of tutorials on the Web.
+  Here is a list of free books on JavaScript.
+  This is a project for a student who wants to get more familiar with JavaScript and Web-programming.
+  A project from which you can draw inspiration is this
+  List-to-JavaScript
+  translator. Here is another such project.
+  And another in less than 100 lines of code.
+  Coffeescript is a similar project
+  except that it is already quite mature. And finally not to
+  forget TypeScript developed by Microsoft.
+
+
+ -  
[CU4] Slide-Making in the Web-Age
+
+  
+  The standard technology for writing scientific papers in Computer Science  is to use
+  LaTeX, a document preparation
+  system originally implemented by Donald Knuth
+  and Leslie Lamport.
+  LaTeX produces very pleasantly looking documents, can deal nicely with mathematical
+  formulas and is very flexible. If you are interested here
+  is a side-by-side comparison between Word and LaTeX (which LaTeX “wins” with 18 out of 21 points).
+  Computer scientists not only use LaTeX for documents,
+  but also for slides (really, nobody who wants to be cool uses Keynote or Powerpoint).
+  
+
+  
+  Although used widely, LaTeX seems nowadays a bit dated for producing
+  slides. Unlike documents, which are typically “static” and published in a book or journal,
+  slides often contain changing contents that might first only be partially visible and
+  only later revealed as the “story” of a talk or lecture demands.
+  Also slides often contain animated algorithms where each state in the
+  calculation is best explained by highlighting the changing data.
+  
+
+  
+  It seems HTML and JavaScript are much better suited for generating
+  such animated slides. This page
+  links to 22 slide-generating programs using this combination of technologies. 
+  Here are even more such
+  projects. However, the problem with all of these project is that they depend heavily on the users being
+  able to write JavaScript, CCS or HTML...not something one would like to depend on given that
+  “normal” users likely only have a LaTeX background. The aim of this project is to invent a
+  very simple language that is inspired by LaTeX and then generate from code written in this language
+  slides that can be displayed in a web-browser.
+  
+
+ 
+ This sounds complicated, but there is already some help available:
+ Mathjax is a JavaScript library that can
+ be used to display mathematical text, for example
+
+ 
+ When \(a \ne 0\), there are two solutions to \(ax^2 + bx + c = 0\) and they are
+ \(x = {-b \pm \sqrt{b^2-4ac} \over 2a}\).
+ 
+
+ by writing code in the familiar LaTeX-way. This can be reused. There are also plenty of JavaScript
+ libraries for graphical animations (for example
+ Raphaël,
+ SVG.JS,
+ Bonsaijs,
+ JSXGraph). The inspiration for how the user should be able to write
+ slides could come from the LaTeX packages Beamer
+ and PGF/TikZ.
+ 
+
+  
+  Skills: 
+  This is a project requires good knowledge of JavaScript. You need to be able to
+  parse a language and translate it to a suitable part of JavaScript using
+  appropriate libraries. Tutorials for JavaScript are here.
+  A parser generator for JavaScript is here. There are probably also
+  others.
+  
+
+ -  
[CU5] An Online-Student Voting System
+
+  
+  Description:
+  One of the more annoying aspects of giving a lecture is to ask a question
+  to the students and no matter how easy the questions is to not 
+  receive an answer. Recently, the online course system 
+  Udacity made an art out of
+  asking questions during lectures (see for example the
+  Web Application Engineering 
+  course CS253).
+  The lecturer there gives multiple-choice questions as part of the lecture and the students need to 
+  click on the appropriate answer. This works very well in the online world. 
+  For  “real-world” lectures, the department has some 
+  clickers
+  (these are little devices part of an audience response systems). However, 
+  they are a logistic nightmare for the lecturer: they need to be distributed 
+  during the lecture and collected at the end. Nowadays, where students
+  come with their own laptop or smartphone to lectures, this can
+  be improved.
+  
+
+  
+  The task of this project is to implement an online student
+  polling system. The lecturer should be able to prepare 
+  questions beforehand (encoded as some web-form) and be able to 
+  show them during the lecture. The students
+  can give their answers by clicking on the corresponding webpage.
+  The lecturer can then collect the responses online and evaluate them 
+  immediately. Such a system is sometimes called
+  HTML voting. 
+  There are a number of commercial
+  solutions for this problem, but they are not easy to use (in addition
+  to being ridiculously expensive). A good student can easily improve upon
+  what they provide. 
+  
+
+  
+  The problem of student polling is not as hard as 
+  electronic voting, 
+  which essentially is still an unsolved problem in Computer Science. The
+  students only need to be prevented from answering question more than once thus skewing
+  any statistics. Unlike electronic voting, no audit trail needs to be kept
+  for student polling. Restricting the number of answers can probably be solved 
+  by setting appropriate cookies on the students
+  computers or smart phones.
+  
+
+  
+  Literature: 
+  The project requires fluency in a web-programming language (for example 
+  Javascript,
+  Go, 
+  Scala). However JavaScript with
+  the Node.js extension seems to be best suited for the job.
+  Here is a tutorial on Node.js for beginners.
+  For web-programming the 
+  Web Application Engineering
+  course at Udacity is a good starting point 
+  to be aware of the issues involved. This course uses Python.
+  To evaluate the answers from the students, Google's 
+  Chart Tools
+  might be useful, which are also described in this 
+  youtube video.
+  
+
+  
+  Skills: 
+  In order to provide convenience for the lecturer, this project needs very good web-programming skills. A 
+  hacker mentality
+  (see above) is probably very beneficial: web-programming is an area that only emerged recently and
+  many tools still lack maturity. You probably have to experiment a lot with several different
+  languages and tools.
+  
+
+ -  
[CU6] An Infrastructure for Displaying and Animating Code in a Web-Browser
+  
+
+  Description:
+  The project aim is to implement an infrastructure for displaying and
+  animating code in a web-browser. The infrastructure should be agnostic
+  with respect to the programming language, but should be configurable.
+  I envisage something smaller than the projects 
+  here (for Python),
+  here (for Java),
+  here (for multiple languages),
+  here (for HTML)
+  here (for JavaScript),
+  and here (for Scala).
+  
+
+  
+  The tasks in this project are being able (1) to lex and parse languages and (2) to write an interpreter.
+  The goal is to implement this as much as possible in a language-agnostic fashion.
+  
+
+  
+  Skills: 
+  Good skill in lexing and language parsing, as well as being fluent with web programming (for
+  example JavaScript).
+  
+
+
+ -  
[CU7] Implementation of a Distributed Clock-Synchronisation Algorithm developed at NASA
+  
+  
+  Description:
+  There are many algorithms for synchronising clocks. This
+  paper 
+  describes a new algorithm for clocks that communicate by exchanging
+  messages and thereby reach a state in which (within some bound) all clocks are synchronised.
+  A slightly longer and more detailed paper about the algorithm is 
+  here.
+  The point of this project is to implement this algorithm and simulate networks of clocks.
+  
+
+  
+  Literature: 
+  There is a wide range of literature on clock synchronisation algorithms. 
+  Some pointers are given in this
+  paper,
+  which describes the algorithm to be implemented in this project. Pointers
+  are given also here.
+  
+
+  
+  Skills: 
+  In order to implement a simulation of a network of clocks, you need to tackle
+  concurrency. You can do this for example in the programming language
+  Scala with the help of the 
+  Akka library. This library enables you to send messages
+  between different actors. Here 
+  are some examples that explain how to implement exchanging messages between actors. 
+  
+
+
+
+
+ -  
[CU8] Raspberry Pis and Arduinos
+
+  
+  Description:
+  This project is for true hackers! Raspberry Pis
+  are small Linux computers the size of a credit-card and only cost £34 (see picture left below). They were introduced
+  in 2012 and people went crazy...well some of them. There is a
+  Google+ community about Raspberry Pis that has more
+  than 58k of followers. It is hard to keep up with what people do with these small computers. The possibilities
+  seem to be limitless. The main resource for Raspberry Pis is here.
+  There are magazines dedicated to them and tons of
+  books (not to mention
+  floods of online material).
+  
+
+  
+  Arduinos are slightly older (from 2005) but still very cool (see picture right below). They
+  are small single-board micro-controllers that can talk to various external gadgets (sensors, motors, etc). Since Arduinos
+  are open-software and open-hardware there are many clones and add-on boards. Like for the Raspberry Pi, there
+  is a lot of material available about Arduinos.
+  The main reference is here. Like the Raspberry Pis, the good thing about
+  Arduinos is that they can be powered with simple AA-batteries.
+  
+
+  
+  I have two such Raspberry Pis including wifi-connectors and two cameras.
+  I also have two Freakduino Boards that are Arduinos extended with wireless communication. I can lend them to responsible
+  students for one or two projects. However, the aim is to first come up with an idea for a project. Popular projects are
+  automated temperature sensors, network servers, robots, web-cams (here
+  is a web-cam directed at the Shard that can
+  tell
+  you whether it is raining or cloudy). There are plenty more ideas listed
+  here for Raspberry Pis and
+  here for Arduinos.
+  
+
+  
+  There are essentially two kinds of projects: One is purely software-based. Software projects for Raspberry Pis are often
+  written in Python, but since these are Linux-capable computers any other
+  language would do as well. You can also write your own operating system as done
+  here. For example the students
+  here developed their own bare-metal OS and then implemented
+  a chess-program on top of it (have a look at their very impressive
+  youtube video).
+  The other kind of project is a combination of hardware and software-based; usually attaching sensors
+  or motors to the Raspberry Pis and Arduino. This might require some soldering or what is called
+  a bread-board. But be careful before choosing a project
+  involving new hardware: these devices
+  can be destroyed (if “Vin connected to GND” or “drawing more than 30mA from a GPIO”
+  does not make sense to you, you should probably stay away from such a project). 
+  
+
+  
+  
+  
+
+  
+  
+  
+
+ -  
Earlier Projects
+
+ I am also open to project suggestions from you. You might find some inspiration from my earlier projects:
+ BSc 2012, 
+ MSc 2012 
+
+ 
+