Email: christian dot urban at kcl dot ac dot uk, Office: Bush House N7.07
+
+
Note that besides being a lecturer at the theoretical end of Computer Science, I am also a passionate
+ hacker …
+ defined as “a person who enjoys exploring the details of programmable systems and
+ stretching their capabilities, as opposed to most users, who prefer to learn only the minimum
+ necessary.” I am always happy to supervise like-minded students.
+
+
+
In 2013/14, I was nominated by the students
+ for the best BSc project supervisor and best MSc project supervisor awards in the NMS
+ faculty. Somehow I won both. In 2014/15 I was nominated again for the best MSc
+ project supervisor, but did not win it. ;o)
+
+
+
+
[CU1] Regular Expressions, Lexing and Derivatives
+
+
+ Description:
+ Regular expressions
+ are extremely useful for many text-processing tasks, such as finding patterns in hostile
+ network traffic,
+ lexing programs, syntax highlighting and so on. Given that regular expressions were
+ introduced in 1950 by Stephen Kleene,
+ you might think regular expressions have since been studied and implemented to death. But you would definitely be
+ mistaken: in fact they are still an active research area. On the top of my head, I can give
+ you at least ten research papers that appeared in the last few years.
+ For example
+ this paper
+ about regular expression matching and derivatives was presented in 2014 at the international
+ FLOPS conference. Another paper by my PhD student and me was presented in 2016
+ at the international ITP conference.
+ The task in this project is to implement these results and use them for lexing.
+
+
The background for this project is that some regular expressions are
+ “evil”
+ and can “stab you in the back” according to
+ this blog post.
+ For example, if you use in Python or
+ in Ruby (or also in a number of other mainstream programming languages) the
+ innocently looking regular expression a?{28}a{28} and match it, say, against the string
+ aaaaaaaaaaaaaaaaaaaaaaaaaaaa (that is 28 as), you will soon notice that your CPU usage goes to 100%. In fact,
+ Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself:
+ catastrophic.py (Python version) and
+ catastrophic.rb
+ (Ruby version). Here is a similar problem with the regular expression (a*)*b in Java:
+ catastrophic.java
+
+
+
+ You can imagine an attacker
+ mounting a nice DoS attack against
+ your program if it contains such an “evil” regular expression. But it can also happen by accident:
+ on 20 July 2016 the website Stack Exchange
+ was knocked offline because of an evil regular expression. One of their engineers talks about this in this
+ video. A similar problem needed to be fixed in the
+ Atom editor.
+ A few implementations of regular expression matchers are almost immune from such problems.
+ For example, Scala can deal with strings of up to 4,300 as in less than a second. But if you scale
+ the regular expression and string further to, say, 4,600 as, then you get a StackOverflowError
+ potentially crashing your program. Moreover (beside the "minor" problem of being painfully slow) according to this
+ report
+ nearly all regular expression matchers using the POSIX rules are actually buggy.
+
+
+
+ On a rainy afternoon, I implemented
+ this
+ regular expression matcher in Scala. It is not as fast as the official one in Scala, but
+ it can match up to 11,000 as in less than 5 seconds without raising any exception
+ (remember Python and Ruby both need nearly 30 seconds to process 28(!) as, and Scala's
+ official matcher maxes out at 4,600 as). My matcher is approximately
+ 85 lines of code and based on the concept of
+ derivatives of regular expressions.
+ These derivatives were introduced in 1964 by
+ Janusz Brzozowski, but according to this
+ paper had been lost in the “sands of time”.
+ The advantage of derivatives is that they side-step completely the usual
+ translations of regular expressions
+ into NFAs or DFAs, which can introduce the exponential behaviour exhibited by the regular
+ expression matchers in Python, Java and Ruby.
+
+
+
+ Now the authors from the
+ FLOPS'14-paper mentioned
+ above claim they are even faster than me and can deal with even more features of regular expressions
+ (for example subexpression matching, which my rainy-afternoon matcher cannot). I am sure they thought
+ about the problem much longer than a single afternoon. The task
+ in this project is to find out how good they actually are by implementing the results from their paper.
+ Their approach to regular expression matching is also based on the concept of derivatives.
+ I used derivatives very successfully once for something completely different in a
+ paper
+ about the Myhill-Nerode theorem.
+ So I know they are worth their money. Still, it would be interesting to actually compare their results
+ with my simple rainy-afternoon matcher and potentially “blow away” the regular expression matchers
+ in Python, Ruby and Java (and possibly in Scala too). The application would be to implement a fast lexer for
+ programming languages, or improve the network traffic analysers in the tools Snort and
+ Bro.
+
+
+
+ Literature:
+ The place to start with this project is obviously this
+ paper
+ and this one.
+ Traditional methods for regular expression matching are explained
+ in the Wikipedia articles
+ here and
+ here.
+ The authoritative book
+ on automata and regular expressions is by John Hopcroft and Jeffrey Ullmann (available in the library).
+ There is also an online course about this topic by Ullman at
+ Coursera, though IMHO not
+ done with love.
+ There are millions of other pointers about regular expression
+ matching on the Web. I found the chapter on Lexing in this
+ online book very helpful. Finally, it will
+ be of great help for this project to take part in my Compiler and Formal Language module (6CCS3CFL).
+ Test cases for “evil”
+ regular expressions can be obtained from here.
+
+
+
+ Skills:
+ This is a project for a student with an interest in theory and with
+ good programming skills. The project can be easily implemented
+ in functional languages like
+ Scala,
+ F#,
+ ML,
+ Haskell, etc. Python and other non-functional languages
+ can be also used, but seem much less convenient. If you do attend my Compilers and Formal Languages
+ module, that would obviously give you a head-start with this project.
+
+
+
[CU2] Grammars and Derivative-Based Parsing Algorithms
+
+
+Parsing is an old nut. Generations of software developers need to do parsing of data or text.
+There are zillions of links, tools, papers and textbooks about parsing. One particular
+book contains something
+like 700 different algorithm, nicely analysed and described. Surely, parsing must be a solved problem. Or is it?
+Laurie Tratt has a blog post
+about Parsing: The Solved Problem That Isn't. IMHO parsing is still a wide open field and not solved at all.
+PEG parsing, error reporting, error correction, runtime to name just a few are aspects that seem to cause headaches
+to developers, and to researchers.
+
+
+A recent paper
+follows an idea for regular expressions: it adapts the notion of
+derivatives of regular expressions to grammars. The idea is to implement in a functional programming language
+the parsing algorithm proposed in this paper and to try it out with some sample data. There is also
+a recent PhD thesis about derivative-based parsing
+Efficient Parsing with Derivatives and Zippers.
+
+ Description:
+ Compilers translate high-level programs that humans can read and write into
+ efficient machine code that can be run on a CPU or virtual machine.
+ A compiler for a simple functional language generating X86 code is described
+ here.
+ I recently implemented a very simple compiler for an even simpler functional
+ programming language following this
+ paper
+ (also described here).
+ My code, written in Scala, of this compiler is
+ here.
+ The compiler can deal with simple programs involving natural numbers, such
+ as Fibonacci numbers or factorial (but it can be easily extended - that is not the point).
+
+
+
+ While the hard work has been done (understanding the two papers above),
+ my compiler only produces some idealised machine code. For example I
+ assume there are infinitely many registers. The goal of this
+ project is to generate machine code that is more realistic and can
+ run on a CPU, like X86, or run on a virtual machine, say the JVM.
+ This gives probably a speedup of thousand times in comparison to
+ my naive machine code and virtual machine. The project
+ requires to dig into the literature about real CPUs and generating
+ real machine code.
+
+
+ An alternative is to not generate machine code, but build a compiler that compiles to
+ JavaScript. This is the language that is supported by most
+ browsers and therefore is a favourite
+ vehicle for Web-programming. Some call it the scripting language of the Web.
+ Unfortunately, JavaScript is also probably one of the worst
+ languages to program in (being designed and released in a hurry). But it can be used as a convenient target
+ for translating programs from other languages. In particular there are two
+ very optimised subsets of JavaScript that can be used for this purpose:
+ one is asm.js and the other is
+ emscripten. Since
+ last year there is even the official Webassembly
+ There is a tutorial for emscripten
+ and an impressive demo which runs the
+ Unreal Engine 3
+ in a browser with spectacular speed. This was achieved by compiling the
+ C-code of the Unreal Engine to the LLVM intermediate language and then translating the LLVM
+ code to JavaScript.
+
+
+
+ Literature:
+ There is a lot of literature about compilers
+ (for example this book -
+ I can lend you my copy for the duration of the project, or this
+ online book). A very good overview article
+ about implementing compilers by
+ Laurie Tratt is
+ here.
+ An online book about the Art of Assembly Language is
+ here.
+ An introduction into x86 machine code is here.
+ Intel's official manual for the x86 instruction is
+ here.
+ Two assemblers for the JVM are described here
+ and here.
+ An interesting twist of this project is to not generate code for a CPU, but
+ for the intermediate language of the LLVM compiler
+ (also described here). If you want to see
+ what machine code looks like you can compile your C-program using gcc -S.
+
+
+ If JavaScript is chosen as a target instead, then there are plenty of tutorials on the Web.
+ Here is a list of free books on JavaScript.
+ A project from which you can draw inspiration is this
+ Lisp-to-JavaScript
+ translator. Here is another such project.
+ And another in less than 100 lines of code.
+ Coffeescript is a similar project
+ except that it is already quite mature. And finally not to
+ forget TypeScript developed by Microsoft. The main
+ difference between these projects and this one is that they translate into relatively high-level
+ JavaScript code; none of them use the much lower levels asm.js and
+ emscripten.
+
+
+ Skills:
+ This is a project for a student with a deep interest in programming languages and
+ compilers. Since my compiler is implemented in Scala,
+ it would make sense to continue this project in this language. I can be
+ of help with questions and books about Scala.
+ But if Scala is a problem, my code can also be translated quickly into any other functional
+ language. Again, it will be of great help for this project to take part in
+ my Compiler and Formal Language module (6CCS3CFL).
+
+
+
+ PS: Compiler projects consistently received high marks in the past.
+ I have supervised eight so far and most of them received a mark above 70% - one even was awarded a prize.
+
+
+
[CU4] Webassembly Interpreter / Compiler
+
+
+Webassembly is a recently agreed standard for speeding up web applications in browsers. In this
+project the aim is to implement an interpreter or compiler for webassembly. There are already
+reference interpreters,
+but people take different views, for example implement a
+Forth language on top of webassembly.
+What is good about webassembly is that is a rather simple format, which can be generated quite
+easily, unlike Java class files, which need some head-standing when you generate them.
+
+ The standard technology for writing scientific papers in Computer Science is to use
+ LaTeX, a document preparation
+ system originally implemented by Donald Knuth
+ and Leslie Lamport.
+ LaTeX produces very pleasantly looking documents, can deal nicely with mathematical
+ formulas and is very flexible. If you are interested, here
+ is a side-by-side comparison between Word and LaTeX (which LaTeX “wins” with 18 out of 21 points).
+ Computer scientists not only use LaTeX for documents,
+ but also for slides (really, nobody who wants to be cool uses Keynote or Powerpoint).
+
+
+
+ Although used widely, LaTeX seems nowadays a bit dated for producing
+ slides. Unlike documents, which are typically “static” and published in a book or journal,
+ slides often contain changing contents that might first only be partially visible and
+ only later be revealed as the “story” of a talk or lecture demands.
+ Also slides often contain animated algorithms where each state in the
+ calculation is best explained by highlighting the changing data.
+
+
+
+ It seems HTML and JavaScript are much better suited for generating
+ such animated slides. This page
+ links to slide-generating programs using this combination of technologies.
+ However, the problem with all of these project is that they depend heavily on the users being
+ able to write JavaScript, CCS or HTML...not something one would like to depend on given that
+ “normal” users likely only have a LaTeX background. The aim of this project is to invent a
+ very simple language that is inspired by LaTeX and then generate from code written in this language
+ slides that can be displayed in a web-browser. An example would be the
+ Madoko project.
+
+
+
+ This sounds complicated, but there is already some help available:
+ Mathjax is a JavaScript library that can
+ be used to display mathematical text, for example
+
+
+
When \(a \ne 0\), there are two solutions to \(ax^2 + bx + c = 0\) and they are
+ \(x = {-b \pm \sqrt{b^2-4ac} \over 2a}\).
+
+
+
+ by writing code in the familiar LaTeX-way. This can be reused.
+ Another such library is KaTeX.
+ There are also plenty of JavaScript
+ libraries for graphical animations (for example
+ Raphael,
+ SVG.JS,
+ Bonsaijs,
+ JSXGraph). The inspiration for how the user should be able to write
+ slides could come from the LaTeX packages Beamer
+ and PGF/TikZ. A slide-making project from which
+ inspiration can be drawn is hyhyhy.
+
+
+
+ Skills:
+ This is a project that requires good knowledge of JavaScript. You need to be able to
+ parse a language and translate it to a suitable part of JavaScript using
+ appropriate libraries. Tutorials for JavaScript are here.
+ A parser generator for JavaScript is here. There are probably also
+ others. If you want to avoid JavaScript there are a number of alternatives: for example the
+ Elm
+ language has been especially designed for implementing interactive animations, which would be
+ very convenient for this project. A nice slide making project done by a previous student is
+ MarkSlides by Oleksandr Cherednychenko.
+
+
+
[CU6] Raspberry Pi's and Arduinos
+
+
+ Description:
+ This project is for true hackers! Raspberry Pi's
+ are small Linux computers the size of a credit-card and only cost £26, the
+ simplest version even costs only £5 (see pictures on the left below). They were introduced
+ in 2012 and people went crazy...well some of them. There is a
+ Google+
+ community about Raspberry Pi's that has more
+ than 300k of followers. A similar number follow the corresponding group
+ on Facebook. It is hard to keep up with what people do with these small computers. The possibilities
+ seem to be limitless. The main resource for Raspberry Pi's is here.
+ There are magazines dedicated to them and tons of
+ books (not to mention
+ floods of online material,
+ such as the RPi projects book).
+ Google just released a
+ framework
+ for web-programming on Raspberry Pi's turning them into webservers.
+ In my home one Raspberry Pi has the very important task of automatically filtering out
+ nearly all advertisments using the
+ Pi-Hole software
+ (you cannot imagine what difference this does to your web experience).
+
+
+
+ Arduinos are slightly older (from 2005) but still very cool (see picture on the right below). They
+ are small single-board micro-controllers that can talk to various external gadgets (sensors, motors, etc). Since Arduinos
+ are open-software and open-hardware there are many clones and add-on boards. Like for the Raspberry Pi, there
+ is a lot of material available about Arduinos.
+ The main reference is here. Like the Raspberry Pi's, the good thing about
+ Arduinos is that they can be powered with simple AA-batteries.
+
+
+
+ I have several Raspberry Pi's including wifi-connectors and two cameras.
+ I also have two Freakduino Boards that are Arduinos extended with wireless communication. I can lend them to responsible
+ students for one or two projects. However, the aim is to first come up with an idea for a project. Popular projects are
+ automated temperature sensors, network servers, robots, web-cams (here
+ is a web-cam directed at the Shard that can
+ tell
+ you whether it is raining or cloudy). There are plenty more ideas listed
+ here for Raspberry Pi's and
+ here for Arduinos.
+
+
+
+ There are essentially two kinds of projects: One is purely software-based. Software projects for Raspberry Pi's are often
+ written in Python, but since these are Linux-capable computers any other
+ language would do as well. You can also write your own operating system as done
+ here. For example the students
+ here developed their own bare-metal OS and then implemented
+ a chess-program on top of it (have a look at their very impressive
+ youtube video).
+ The other kind of project is a combination of hardware and software; usually attaching some sensors
+ or motors to the Raspberry Pi or Arduino. This might require some soldering or what is called
+ a bread-board. But be careful before choosing a project
+ involving new hardware: these devices
+ can be destroyed (if “Vin connected to GND” or “drawing more than 30mA from a GPIO”
+ does not make sense to you, you should probably stay away from such a project).
+
+
+
+
+
+
+
+
+
+
+
+
+ Skills:
+ Well, you must be a hacker; happy to make things. Your desk might look like the photo below on the left.
+ The photo below on the middle shows an earlier student project which connects wirelessly a wearable Arduino (packaged
+ in a "self-3d-printed" watch) to a Raspberry Pi seen in the background. The Arduino in the foreground takes
+ measurements of
+ heart rate and body temperature; the Raspberry Pi collects this data and makes it accessible via a simple
+ web-service. The picture on the right is another project that implements an airmouse using an Arduino.
+
+
+
+
+
+
+
+
+
+
+ A really cool project using a toy helicopter and two Raspberry Pi's was done by Nikolaos Kyknas. He transformed
+ an off-the-shelf toy helicopter into an autonomous flying machine. He attached a Raspberry Pi Zero and an ultrasound
+ sensor to the helicopter for measuring the distance from ground. Another Raspberry Pi is attached to the “ground control
+ unit” in order to give instructions to the throttle of the helicopter. Both Raspberry Pi's communicate over WiFi for calculating
+ the next flight instruction. The goal is to find and maintain a steady altitude. Sounds simple? Well, not so fast!
+ First you need to get the balance of the helicopter plus Raspberry Pi plus its power source just right,
+ otherwise the helicopter will simply take off in random directions. Also the flight instructions need to be just right,
+ otherwise the helicopter would at best “oscillate” around the set altitude, but never be steady. To solve this problem,
+ Nikolaos used exactly the same algorithm that keeps cars at a steady pace when in cruise control.
+
+
+
+
+
+
+
[CU7] An Infrastructure for Displaying and Animating Code in a Web-Browser
+
+
+ Description:
+ The project aim is to implement an infrastructure for displaying and
+ animating code in a web-browser. The infrastructure should be agnostic
+ with respect to the programming language, but should be configurable.
+ I envisage something smaller than the projects
+ here (for Python),
+ here (for Java),
+ here (for multiple languages),
+ here (for HTML)
+ here (for JavaScript),
+ and here (for Scala).
+
+
+
+ The tasks in this project are being able (1) to lex and parse languages and (2) to write an interpreter.
+ The goal is to implement this as much as possible in a language-agnostic fashion.
+
+
+
+ Skills:
+ Good skills in lexing and language parsing, as well as being fluent with web programming (for
+ example JavaScript).
+
+
+
+
[CU8] Proving the Correctness of Programs
+
+
+ I am one of the main developers of the interactive theorem prover
+ Isabelle. This theorem prover
+ has been used to establish the correctness of some quite large
+ programs (for example an operating system).
+ Together with colleagues from Nanjing, I used this theorem prover to establish the correctness of a
+ scheduling algorithm, called
+ Priority Inheritance,
+ for real-time operating systems. This scheduling algorithm is part of the operating
+ system that drives, for example, the
+ Mars rovers.
+ Actually, the very first Mars rover mission in 1997 did not have this
+ algorithm switched on and it almost caused a catastrophic mission failure (see
+ this youtube video here
+ for an explanation what happened).
+ We were able to prove the correctness of this algorithm, but were also able to
+ establish the correctness of some optimisations in this
+ paper.
+
+
+
On a much smaller scale, there are a few small programs and underlying algorithms where it
+ is not really understood whether they always compute a correct result (for example the
+ regular expression matcher by Sulzmann and Lu in project [CU1]). The aim of this
+ project is to completely specify an algorithm in Isabelle and then prove it correct (that is,
+ it always computes the correct result).
+
+
+
+ Skills:
+ This project is for a very good student with a knack for theoretical things and formal reasoning.
+
+
+
[CU9] Anything Security Related that is Interesting
+
+
+If you have your own project that is related to security (must be
+something interesting), please propose it. We can then have a look
+whether it would be suitable for a project.
+
+
+
[CU10] Anything Interesting in the Areas
+
+
+
Elm (a reactive functional language for animating webpages; have a look at the cool examples, or here for an introduction)
+
SMLtoJS (a ML compiler to JavaScript; or anything else related to
+ sane languages that compile to JavaScript)
+
Any statistical data related to Bitcoins (in the spirit of this
+paper or
+ this one; this will probably require some extensive C knowledge or any
+ other heavy-duty programming language)
+
Anything related to programming languages and formal methods (like
+ static program analysis)
+
Anything related to low-cost, hands-on hardware like Raspberry Pi, Arduino,
+ Cubieboard
+
Anything related to unikernel operating systems, like
+ Xen or
+ Mirage OS
+
Any kind of applied hacking, for example the Arduino-based keylogger described
+ here
+