+-
[CU1] Regular Expression Matching and Derivatives
+
+
+ Description:
+ Regular expressions
+ are extremely useful for many text-processing tasks such as finding patterns in texts,
+ lexing programs, syntax highlighting and so on. Given that regular expressions were
+ introduced in 1950 by Stephen Kleene,
+ you might think regular expressions have since been studied and implemented to death. But you would definitely be
+ mistaken: in fact they are still an active research area. For example
+ this paper
+ about regular expression matching and derivatives was presented just last summer at the international
+ FLOPS'14 conference. The task in this project is to implement their results.
+
+ The background for this project is that some regular expressions are
+ “evil”
+ and can “stab you in the back” according to
+ this blog post.
+ For example, if you use in Python or
+ in Ruby (or also in a number of other mainstream programming languages according to this
+ blog) the
+ innocently looking regular expression a?{28}a{28}
and match it, say, against the string
+ aaaaaaaaaaaaaaaaaaaaaaaaaaaa
(that is 28 a
s), you will soon notice that your CPU usage goes to 100%. In fact,
+ Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself:
+ re.py (Python version) and
+ re.rb
+ (Ruby version). You can imagine an attacker
+ mounting a nice DoS attack against
+ your program if it contains such an “evil” regular expression. Actually
+ Scala (and also Java) are almost immune from such
+ attacks as they can deal with strings of up to 4,300 a
s in less than a second. But if you scale
+ the regular expression and string further to, say, 4,600 a
s, then you get a StackOverflowError
+ potentially crashing your program. Moreover (beside the "minor" problem of being painfully slow) according to this
+ report
+ nearly all POSIX regular expression matchers are actually buggy.
+
+
+
+ On a rainy afternoon, I implemented
+ this
+ regular expression matcher in Scala. It is not as fast as the official one in Scala, but
+ it can match up to 11,000 a
s in less than 5 seconds without raising any exception
+ (remember Python and Ruby both need nearly 30 seconds to process 28(!) a
s, and Scala's
+ official matcher maxes out at 4,600 a
s). My matcher is approximately
+ 85 lines of code and based on the concept of
+ derivatives of regular expressions.
+ These derivatives were introduced in 1964 by
+ Janusz Brzozowski, but according to this
+ paper had been lost in the “sands of time”.
+ The advantage of derivatives is that they side-step completely the usual
+ translations of regular expressions
+ into NFAs or DFAs, which can introduce the exponential behaviour exhibited by the regular
+ expression matchers in Python and Ruby.
+
+
+
+ Now the authors from the
+ FLOPS'14-paper mentioned
+ above claim they are even faster than me and can deal with even more features of regular expressions
+ (for example subexpression matching, which my rainy-afternoon matcher cannot). I am sure they thought
+ about the problem much longer than a single afternoon. The task
+ in this project is to find out how good they actually are by implementing the results from their paper.
+ Their approach is based on the concept of derivatives.
+ I used them once myself in a paper
+ in order to prove the Myhill-Nerode theorem.
+ So I know they are worth their money. Still, it would be interesting to actually compare their results
+ with my simple rainy-afternoon matcher and potentially “blow away” the regular expression matchers
+ in Python and Ruby (and possibly in Scala too). The application would be to implement a fast lexer for
+ programming languages.
+
+
+
+ Literature:
+ The place to start with this project is obviously this
+ paper.
+ Traditional methods for regular expression matching are explained
+ in the Wikipedia articles
+ here and
+ here.
+ The authoritative book
+ on automata and regular expressions is by John Hopcroft and Jeffrey Ullmann (available in the library).
+ There is also an online course about this topic by Ullman at
+ Coursera, though IMHO not
+ done with love.
+ Finally, there are millions of other pointers about regular expression
+ matching on the Web. I found the chapter on Lexing in this
+ online book very helpful.
+ Test cases for “evil”
+ regular expressions can be obtained from here.
+
+
+
+
+ Skills:
+ This is a project for a student with an interest in theory and some
+ reasonable programming skills. The project can be easily implemented
+ in functional languages like
+ Scala,
+ F#,
+ ML,
+ Haskell, etc. Python and other non-functional languages
+ can be also used, but seem much less convenient.
+
+
+ -
[CU2] A Compiler for a small Programming Language
+
+
+ Description:
+ Compilers translate high-level programs that humans can read and write into
+ efficient machine code that can be run on a CPU or virtual machine.
+ A compiler for a simple functional language generating X86 code is described
+ here.
+ I recently implemented a very simple compiler for an even simpler functional
+ programming language following this
+ paper
+ (also described here).
+ My code, written in Scala, of this compiler is
+ here.
+ The compiler can deal with simple programs involving natural numbers, such
+ as Fibonacci numbers or factorial (but it can be easily extended - that is not the point).
+
+
+
+ While the hard work has been done (understanding the two papers above),
+ my compiler only produces some idealised machine code. For example I
+ assume there are infinitely many registers. The goal of this
+ project is to generate machine code that is more realistic and can
+ run on a CPU, like X86, or run on a virtual machine, say the JVM.
+ This gives probably a speedup of thousand times in comparison to
+ my naive machine code and virtual machine. The project
+ requires to dig into the literature about real CPUs and generating
+ real machine code.
+
+
+ An alternative is to not generate machine code, but build a compiler that compiles to
+ JavaScript. This is the language that is supported by most
+ browsers and therefore is a favourite
+ vehicle for Web-programming. Some call it the scripting language of the Web.
+ Unfortunately, JavaScript is also probably one of the worst
+ languages to program in (being designed and released in a hurry). But it can be used as a convenient target
+ for translating programs from other languages. In particular there are two
+ very optimised subsets of JavaScript that can be used for this purpose:
+ one is asm.js and the other is
+ emscripten.
+ There is a tutorial for emscripten
+ and an impressive demo which runs the
+ Unreal Engine 3
+ in a browser with spectacular speed. This was achieved by compiling the
+ C-code of the Unreal Engine to the LLVM intermediate language and then translating the LLVM
+ code to JavaScript.
+
+
+
+ Literature:
+ There is a lot of literature about compilers
+ (for example this book -
+ I can lend you my copy for the duration of the project, or this
+ online book). A very good overview article
+ about implementing compilers by
+ Laurie Tratt is
+ here.
+ An online book about the Art of Assembly Language is
+ here.
+ An introduction into x86 machine code is here.
+ Intel's official manual for the x86 instruction is
+ here.
+ A simple assembler for the JVM is described here.
+ An interesting twist of this project is to not generate code for a CPU, but
+ for the intermediate language of the LLVM compiler
+ (also described here). If you want to see
+ what machine code looks like you can compile your C-program using gcc -S.
+
+
+ If JavaScript is chosen as a target instead, then there are plenty of tutorials on the Web.
+ Here is a list of free books on JavaScript.
+ A project from which you can draw inspiration is this
+ List-to-JavaScript
+ translator. Here is another such project.
+ And another in less than 100 lines of code.
+ Coffeescript is a similar project
+ except that it is already quite mature. And finally not to
+ forget TypeScript developed by Microsoft. The main
+ difference between these projects and this one is that they translate into relatively high-level
+ JavaScript code; none of them use the much lower levels asm.js and
+ emscripten.
+
+
+ Skills:
+ This is a project for a student with a deep interest in programming languages and
+ compilers. Since my compiler is implemented in Scala,
+ it would make sense to continue this project in this language. I can be
+ of help with questions and books about Scala.
+ But if Scala is a problem, my code can also be translated quickly into any other functional
+ language.
+
+
+
+ PS: Compiler projects, like this one or [CU8], consistently received high marks in the past.
+ I suprvised four so far and none of them received a mark below 70% - one even was awarded a prize.
+
+
+ -
[CU3] Slide-Making in the Web-Age
+
+
+ The standard technology for writing scientific papers in Computer Science is to use
+ LaTeX, a document preparation
+ system originally implemented by Donald Knuth
+ and Leslie Lamport.
+ LaTeX produces very pleasantly looking documents, can deal nicely with mathematical
+ formulas and is very flexible. If you are interested, here
+ is a side-by-side comparison between Word and LaTeX (which LaTeX “wins” with 18 out of 21 points).
+ Computer scientists not only use LaTeX for documents,
+ but also for slides (really, nobody who wants to be cool uses Keynote or Powerpoint).
+
+
+
+ Although used widely, LaTeX seems nowadays a bit dated for producing
+ slides. Unlike documents, which are typically “static” and published in a book or journal,
+ slides often contain changing contents that might first only be partially visible and
+ only later be revealed as the “story” of a talk or lecture demands.
+ Also slides often contain animated algorithms where each state in the
+ calculation is best explained by highlighting the changing data.
+
+
+
+ It seems HTML and JavaScript are much better suited for generating
+ such animated slides. This page
+ links to 22 slide-generating programs using this combination of technologies.
+ However, the problem with all of these project is that they depend heavily on the users being
+ able to write JavaScript, CCS or HTML...not something one would like to depend on given that
+ “normal” users likely only have a LaTeX background. The aim of this project is to invent a
+ very simple language that is inspired by LaTeX and then generate from code written in this language
+ slides that can be displayed in a web-browser.
+
+
+
+ This sounds complicated, but there is already some help available:
+ Mathjax is a JavaScript library that can
+ be used to display mathematical text, for example
+
+
+ When \(a \ne 0\), there are two solutions to \(ax^2 + bx + c = 0\) and they are
+ \(x = {-b \pm \sqrt{b^2-4ac} \over 2a}\).
+
+
+ by writing code in the familiar LaTeX-way. This can be reused.
+ Another such library is KaTeX.
+ There are also plenty of JavaScript
+ libraries for graphical animations (for example
+ Raphael,
+ SVG.JS,
+ Bonsaijs,
+ JSXGraph). The inspiration for how the user should be able to write
+ slides could come from the LaTeX packages Beamer
+ and PGF/TikZ.
+
+
+
+ Skills:
+ This is a project requires good knowledge of JavaScript. You need to be able to
+ parse a language and translate it to a suitable part of JavaScript using
+ appropriate libraries. Tutorials for JavaScript are here.
+ A parser generator for JavaScript is here. There are probably also
+ others. If you want to avoid JavaScript there are a number of alternatives: for example the
+ Elm
+ language has been especially designed for implementing easily interactive animations, which would be
+ very convenient for this project.
+
+
+ -
[CU4] An Online Student Voting System
+
+
+ Description:
+ One of the more annoying aspects of giving a lecture is to ask a question
+ to the students and no matter how easy the question is to not
+ receive any answer. The online course system
+ Udacity, in contrast, made an art out of
+ asking questions during lectures (see for example the
+ Web Application Engineering
+ course CS253).
+ The lecturer there gives multiple-choice questions as part of the lecture and the students need to
+ click on the appropriate answer. This works very well in the online world.
+ For “real-world” lectures, the department has some
+ clickers
+ (these are little devices which form a part of an audience response systems). However,
+ they are a logistic nightmare for the lecturer: they need to be distributed
+ during the lecture and collected at the end. Nowadays, where students
+ come with their own laptop or smartphone to lectures, this can
+ be improved.
+
+
+
+ The task of this project is to implement an online student
+ polling system. The lecturer should be able to prepare
+ questions beforehand (encoded as some web-form) and be able to
+ show them during the lecture. The students
+ can give their answers by clicking on the corresponding webpage.
+ The lecturer can then collect the responses online and evaluate them
+ immediately. Such a system is sometimes called
+ HTML voting.
+ There are a number of commercial
+ solutions for this problem, but they are not easy to use (in addition
+ to being ridiculously expensive). A good student can easily improve upon
+ what they provide.
+
+
+
+ The problem of student polling is not as hard as
+ electronic voting,
+ which essentially is still an unsolved problem in Computer Science. The
+ students only need to be prevented from answering question more than once thus skewing
+ any statistics. Unlike electronic voting, no audit trail needs to be kept
+ for student polling. Restricting the number of answers can probably be solved
+ by setting appropriate cookies on the students
+ computers or smart phones.
+
+
+
+ Literature:
+ The project requires fluency in a web-programming language (for example
+ JavaScript,
+ Go,
+ Scala). However JavaScript with
+ the Node.js extension seems to be best suited for the job.
+ Here is a tutorial on Node.js for beginners.
+ For web-programming the
+ Web Application Engineering
+ course at Udacity is a good starting point
+ to be aware of the issues involved. This course uses Python.
+ To evaluate the answers from the students, Google's
+ Chart Tools
+ might be useful, which is also described in this
+ youtube video.
+
+
+
+ Skills:
+ In order to provide convenience for the lecturer, this project needs very good web-programming skills. A
+ hacker mentality
+ (see above) is probably also very beneficial: web-programming is an area that only emerged recently and
+ many tools still lack maturity. You probably have to experiment a lot with several different
+ languages and tools.
+
+
+ -
[CU5] Raspberry Pi's and Arduinos
+
+
+ Description:
+ This project is for true hackers! Raspberry Pi's
+ are small Linux computers the size of a credit-card and only cost £34 (see picture on the left below). They were introduced
+ in 2012 and people went crazy...well some of them. There is a
+ Google+ community about Raspberry Pi's that has more
+ than 172k of followers. It is hard to keep up with what people do with these small computers. The possibilities
+ seem to be limitless. The main resource for Raspberry Pi's is here.
+ There are magazines dedicated to them and tons of
+ books (not to mention
+ floods of online material).
+ Google just released a
+ framework
+ for web-programming on Raspberry Pi's truning them into webservers.
+
+
+
+ Arduinos are slightly older (from 2005) but still very cool (see picture on the right below). They
+ are small single-board micro-controllers that can talk to various external gadgets (sensors, motors, etc). Since Arduinos
+ are open-software and open-hardware there are many clones and add-on boards. Like for the Raspberry Pi, there
+ is a lot of material available about Arduinos.
+ The main reference is here. Like the Raspberry Pi's, the good thing about
+ Arduinos is that they can be powered with simple AA-batteries.
+
+
+
+ I have two such Raspberry Pi's including wifi-connectors and two cameras.
+ I also have two Freakduino Boards that are Arduinos extended with wireless communication. I can lend them to responsible
+ students for one or two projects. However, the aim is to first come up with an idea for a project. Popular projects are
+ automated temperature sensors, network servers, robots, web-cams (here
+ is a web-cam directed at the Shard that can
+ tell
+ you whether it is raining or cloudy). There are plenty more ideas listed
+ here for Raspberry Pi's and
+ here for Arduinos.
+
+
+
+ There are essentially two kinds of projects: One is purely software-based. Software projects for Raspberry Pi's are often
+ written in Python, but since these are Linux-capable computers any other
+ language would do as well. You can also write your own operating system as done
+ here. For example the students
+ here developed their own bare-metal OS and then implemented
+ a chess-program on top of it (have a look at their very impressive
+ youtube video).
+ The other kind of project is a combination of hardware and software; usually attaching some sensors
+ or motors to the Raspberry Pi or Arduino. This might require some soldering or what is called
+ a bread-board. But be careful before choosing a project
+ involving new hardware: these devices
+ can be destroyed (if “Vin connected to GND” or “drawing more than 30mA from a GPIO”
+ does not make sense to you, you should probably stay away from such a project).
+
+
+
+
+
+
+
+
+
+
+
+ Skills:
+ Well, you must be a hacker; happy to make things. Your desk might look like on the photo on the left.
+ The right photo shows an earlier student project which connects wirelessly a wearable Arduino (packaged
+ in a "self-3d-printed" watch) to a Raspberry Pi seen in the background. The Arduino takes meaurements of
+ heart rate and body temperature; the Raspberry Pi collects this data and makes it accessible via a simple
+ web-service.
+
+
+
+
+
+
+
+ -
[CU6] Generating Testcases from a Specification
+
+ -
[CU7] GPRS + GPS for Arduinos
+
+ -
[CU8] Language Translator into JavaScript
+
+
+ Description:
+ JavaScript is a language that is supported by most
+ browsers and therefore is a favourite
+ vehicle for Web-programming. Some call it the scripting language of the Web.
+ Unfortunately, JavaScript is probably one of the worst
+ languages to program in (being designed and released in a hurry). But it can be used as a convenient target
+ for translating programs from other languages. In particular there are two
+ very optimised subsets of JavaScript that can be used for this purpose:
+ one is asm.js and the other is
+ emscripten.
+ There is a tutorial for emscripten
+ and an impressive demo which runs the
+ Unreal Engine 3
+ in a browser with spectacular speed. This was achieved by compiling the
+ C-code of the Unreal Engine to the LLVM intermediate language and then translating the LLVM
+ code to JavaScript.
+
+
+
+ Skills:
+ This project is about exploring these two subsets of JavaScript and implement a translator
+ of a small language into them. This is similar to the project [CU2] above and requires
+ similar skills. In addition it would be good to have already some familiarity with JavaScript.
+ There are plenty of tutorials on the Web.
+ Here is a list of free books on JavaScript.
+ This is a project for a student who wants to get more familiar with JavaScript and Web-programming.
+ A project from which you can draw inspiration is this
+ List-to-JavaScript
+ translator. Here is another such project.
+ And another in less than 100 lines of code.
+ Coffeescript is a similar project
+ except that it is already quite mature. And finally not to
+ forget TypeScript developed by Microsoft. The main
+ difference between these projects and this one is that they translate into relatively high-level
+ JavaScript code; none of them use the much lower levels asm.js and
+ emscripten.
+
+
+
+
+ -
[CU9] An Infrastructure for Displaying and Animating Code in a Web-Browser
+
+
+ Description:
+ The project aim is to implement an infrastructure for displaying and
+ animating code in a web-browser. The infrastructure should be agnostic
+ with respect to the programming language, but should be configurable.
+ I envisage something smaller than the projects
+ here (for Python),
+ here (for Java),
+ here (for multiple languages),
+ here (for HTML)
+ here (for JavaScript),
+ and here (for Scala).
+
+
+
+ The tasks in this project are being able (1) to lex and parse languages and (2) to write an interpreter.
+ The goal is to implement this as much as possible in a language-agnostic fashion.
+
+
+
+ Skills:
+ Good skill in lexing and language parsing, as well as being fluent with web programming (for
+ example JavaScript).
+
+
+
+ -
[CU10] Implementation of a Distributed Clock-Synchronisation Algorithm developed at NASA
+
+
+ Description:
+ There are many algorithms for synchronising clocks. This
+ paper
+ describes a new algorithm for clocks that communicate by exchanging
+ messages and thereby reach a state in which (within some bound) all clocks are synchronised.
+ A slightly longer and more detailed paper about the algorithm is
+ here.
+ The point of this project is to implement this algorithm and simulate networks of clocks.
+
+
+
+ Literature:
+ There is a wide range of literature on clock synchronisation algorithms.
+ Some pointers are given in this
+ paper,
+ which describes the algorithm to be implemented in this project. Pointers
+ are given also here.
+
+
+
+ Skills:
+ In order to implement a simulation of a network of clocks, you need to tackle
+ concurrency. You can do this for example in the programming language
+ Scala with the help of the
+ Akka library. This library enables you to send messages
+ between different actors. Here
+ are some examples that explain how to implement exchanging messages between actors.
+
+
+ -
[CU11] Proving the Correctness of Programs
+
+
+ I am one of the main developers of the interactive theorem prover
+ Isabelle. This theorem prover
+ has been used to establish the correctness of some quite large
+ programs (for example an operating system).
+ Together with colleagues from Nanjing, I used this theorem prover to establish the correctness of a
+ scheduling algorithm, called
+ Priority Inheritance,
+ for real time operating systems. This scheduling algorithm is part of the operating
+ system that drives, for example, the
+ Mars rovers.
+ Actually, the very first Mars rover mission in 1997 did not have this
+ algorithm switched on and it almost caused a catastrophic mission failure (see
+ this youtube video here
+ for an explanation what happened).
+ We were able to prove the correctness of this algorithm, but were also able to
+ establish the correctness of some optimisations in this
+ paper.
+
+
+ On a much smaller scale, there are a few small programs and underlying algorithms where it
+ is not really understood whether they always compute a correct result (for example the
+ regular expression matcher by Sulzmann and Lu in project [CU1]). The aim of this
+ project is to completely specify an algorithm in Isabelle and then prove it correct (that is,
+ it always computes the correct result).
+
+
+
+ Skills:
+ This project is for a very good student with a knack for theoretical things and formal reasoning.
+
+
+ -
Earlier Projects
+
+ I am also open to project suggestions from you. You might find some inspiration from my earlier projects:
+ BSc 2012/13,
+ MSc 2012/13,
+ BSc 2013/14
+ MSc 2013/14
+
+