diff -r 37b2df329532 -r 846966afdad1 msc-projects-14.html --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/msc-projects-14.html Sun Nov 09 21:37:59 2014 +0000 @@ -0,0 +1,629 @@ + + + +2014/15 MSc Projects + + + + + + + + + + + + + + +
+ +

2014/15 MSc Projects

+

Supervisor: Christian Urban

+

Email: christian dot urban at kcl dot ac dot uk, Office: Strand Building S1.27

+

If you are interested in a project, please send me an email and we can discuss details. Please include +a short description about your programming skills and Computer Science background in your first email. +I will also need your King's username in order to book the project for you. Thanks.

+ +

Note that besides being a lecturer at the theoretical end of Computer Science, I am also a passionate + hacker … + defined as “a person who enjoys exploring the details of programmable systems and + stretching their capabilities, as opposed to most users, who prefer to learn only the minimum + necessary.” I am always happy to supervise like-minded students.

+ +
    +
  • [CU1] Regular Expression Matching and Derivatives

    + +

    + Description: + Regular expressions + are extremely useful for many text-processing tasks such as finding patterns in texts, + lexing programs, syntax highlighting and so on. Given that regular expressions were + introduced in 1950 by Stephen Kleene, + you might think regular expressions have since been studied and implemented to death. But you would definitely be + mistaken: in fact they are still an active research area. For example + this paper + about regular expression matching and derivatives was presented just last summer at the international + FLOPS'14 conference. The task in this project is to implement their results.

    + +

    The background for this project is that some regular expressions are + “evil” + and can “stab you in the back” according to + this blog post. + For example, if you use in Python or + in Ruby (or also in a number of other mainstream programming languages according to this + blog) the + innocently looking regular expression a?{28}a{28} and match it, say, against the string + aaaaaaaaaaaaaaaaaaaaaaaaaaaa (that is 28 as), you will soon notice that your CPU usage goes to 100%. In fact, + Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself: + re.py (Python version) and + re.rb + (Ruby version). You can imagine an attacker + mounting a nice DoS attack against + your program if it contains such an “evil” regular expression. Actually + Scala (and also Java) are almost immune from such + attacks as they can deal with strings of up to 4,300 as in less than a second. But if you scale + the regular expression and string further to, say, 4,600 as, then you get a StackOverflowError + potentially crashing your program. Moreover (beside the "minor" problem of being painfully slow) according to this + report + nearly all POSIX regular expression matchers are actually buggy. +

    + +

    + On a rainy afternoon, I implemented + this + regular expression matcher in Scala. It is not as fast as the official one in Scala, but + it can match up to 11,000 as in less than 5 seconds without raising any exception + (remember Python and Ruby both need nearly 30 seconds to process 28(!) as, and Scala's + official matcher maxes out at 4,600 as). My matcher is approximately + 85 lines of code and based on the concept of + derivatives of regular expressions. + These derivatives were introduced in 1964 by + Janusz Brzozowski, but according to this + paper had been lost in the “sands of time”. + The advantage of derivatives is that they side-step completely the usual + translations of regular expressions + into NFAs or DFAs, which can introduce the exponential behaviour exhibited by the regular + expression matchers in Python and Ruby. +

    + +

    + Now the authors from the + FLOPS'14-paper mentioned + above claim they are even faster than me and can deal with even more features of regular expressions + (for example subexpression matching, which my rainy-afternoon matcher cannot). I am sure they thought + about the problem much longer than a single afternoon. The task + in this project is to find out how good they actually are by implementing the results from their paper. + Their approach is based on the concept of derivatives. + I used them once myself in a paper + in order to prove the Myhill-Nerode theorem. + So I know they are worth their money. Still, it would be interesting to actually compare their results + with my simple rainy-afternoon matcher and potentially “blow away” the regular expression matchers + in Python and Ruby (and possibly in Scala too). The application would be to implement a fast lexer for + programming languages. +

    + +

    + Literature: + The place to start with this project is obviously this + paper. + Traditional methods for regular expression matching are explained + in the Wikipedia articles + here and + here. + The authoritative book + on automata and regular expressions is by John Hopcroft and Jeffrey Ullmann (available in the library). + There is also an online course about this topic by Ullman at + Coursera, though IMHO not + done with love. + Finally, there are millions of other pointers about regular expression + matching on the Web. I found the chapter on Lexing in this + online book very helpful. + Test cases for “evil” + regular expressions can be obtained from here. + +

    + +

    + Skills: + This is a project for a student with an interest in theory and some + reasonable programming skills. The project can be easily implemented + in functional languages like + Scala, + F#, + ML, + Haskell, etc. Python and other non-functional languages + can be also used, but seem much less convenient. +

    + +
  • [CU2] A Compiler for a small Programming Language

    + +

    + Description: + Compilers translate high-level programs that humans can read and write into + efficient machine code that can be run on a CPU or virtual machine. + A compiler for a simple functional language generating X86 code is described + here. + I recently implemented a very simple compiler for an even simpler functional + programming language following this + paper + (also described here). + My code, written in Scala, of this compiler is + here. + The compiler can deal with simple programs involving natural numbers, such + as Fibonacci numbers or factorial (but it can be easily extended - that is not the point). +

    + +

    + While the hard work has been done (understanding the two papers above), + my compiler only produces some idealised machine code. For example I + assume there are infinitely many registers. The goal of this + project is to generate machine code that is more realistic and can + run on a CPU, like X86, or run on a virtual machine, say the JVM. + This gives probably a speedup of thousand times in comparison to + my naive machine code and virtual machine. The project + requires to dig into the literature about real CPUs and generating + real machine code. +

    +

    + An alternative is to not generate machine code, but build a compiler that compiles to + JavaScript. This is the language that is supported by most + browsers and therefore is a favourite + vehicle for Web-programming. Some call it the scripting language of the Web. + Unfortunately, JavaScript is also probably one of the worst + languages to program in (being designed and released in a hurry). But it can be used as a convenient target + for translating programs from other languages. In particular there are two + very optimised subsets of JavaScript that can be used for this purpose: + one is asm.js and the other is + emscripten. + There is a tutorial for emscripten + and an impressive demo which runs the + Unreal Engine 3 + in a browser with spectacular speed. This was achieved by compiling the + C-code of the Unreal Engine to the LLVM intermediate language and then translating the LLVM + code to JavaScript. +

    + +

    + Literature: + There is a lot of literature about compilers + (for example this book - + I can lend you my copy for the duration of the project, or this + online book). A very good overview article + about implementing compilers by + Laurie Tratt is + here. + An online book about the Art of Assembly Language is + here. + An introduction into x86 machine code is here. + Intel's official manual for the x86 instruction is + here. + A simple assembler for the JVM is described here. + An interesting twist of this project is to not generate code for a CPU, but + for the intermediate language of the LLVM compiler + (also described here). If you want to see + what machine code looks like you can compile your C-program using gcc -S. +

    +

    + If JavaScript is chosen as a target instead, then there are plenty of tutorials on the Web. + Here is a list of free books on JavaScript. + A project from which you can draw inspiration is this + List-to-JavaScript + translator. Here is another such project. + And another in less than 100 lines of code. + Coffeescript is a similar project + except that it is already quite mature. And finally not to + forget TypeScript developed by Microsoft. The main + difference between these projects and this one is that they translate into relatively high-level + JavaScript code; none of them use the much lower levels asm.js and + emscripten. +

    +

    + Skills: + This is a project for a student with a deep interest in programming languages and + compilers. Since my compiler is implemented in Scala, + it would make sense to continue this project in this language. I can be + of help with questions and books about Scala. + But if Scala is a problem, my code can also be translated quickly into any other functional + language. +

    + +

    + PS: Compiler projects, like this one or [CU8], consistently received high marks in the past. + I suprvised four so far and none of them received a mark below 70% - one even was awarded a prize. +

    + +
  • [CU3] Slide-Making in the Web-Age

    + +

    + The standard technology for writing scientific papers in Computer Science is to use + LaTeX, a document preparation + system originally implemented by Donald Knuth + and Leslie Lamport. + LaTeX produces very pleasantly looking documents, can deal nicely with mathematical + formulas and is very flexible. If you are interested, here + is a side-by-side comparison between Word and LaTeX (which LaTeX “wins” with 18 out of 21 points). + Computer scientists not only use LaTeX for documents, + but also for slides (really, nobody who wants to be cool uses Keynote or Powerpoint). +

    + +

    + Although used widely, LaTeX seems nowadays a bit dated for producing + slides. Unlike documents, which are typically “static” and published in a book or journal, + slides often contain changing contents that might first only be partially visible and + only later be revealed as the “story” of a talk or lecture demands. + Also slides often contain animated algorithms where each state in the + calculation is best explained by highlighting the changing data. +

    + +

    + It seems HTML and JavaScript are much better suited for generating + such animated slides. This page + links to 22 slide-generating programs using this combination of technologies. + However, the problem with all of these project is that they depend heavily on the users being + able to write JavaScript, CCS or HTML...not something one would like to depend on given that + “normal” users likely only have a LaTeX background. The aim of this project is to invent a + very simple language that is inspired by LaTeX and then generate from code written in this language + slides that can be displayed in a web-browser. +

    + +

    + This sounds complicated, but there is already some help available: + Mathjax is a JavaScript library that can + be used to display mathematical text, for example + +

    +

    When \(a \ne 0\), there are two solutions to \(ax^2 + bx + c = 0\) and they are + \(x = {-b \pm \sqrt{b^2-4ac} \over 2a}\).

    +
    + + by writing code in the familiar LaTeX-way. This can be reused. + Another such library is KaTeX. + There are also plenty of JavaScript + libraries for graphical animations (for example + Raphael, + SVG.JS, + Bonsaijs, + JSXGraph). The inspiration for how the user should be able to write + slides could come from the LaTeX packages Beamer + and PGF/TikZ. +

    + +

    + Skills: + This is a project requires good knowledge of JavaScript. You need to be able to + parse a language and translate it to a suitable part of JavaScript using + appropriate libraries. Tutorials for JavaScript are here. + A parser generator for JavaScript is here. There are probably also + others. If you want to avoid JavaScript there are a number of alternatives: for example the + Elm + language has been especially designed for implementing easily interactive animations, which would be + very convenient for this project. +

    + +
  • [CU4] An Online Student Voting System

    + +

    + Description: + One of the more annoying aspects of giving a lecture is to ask a question + to the students and no matter how easy the question is to not + receive any answer. The online course system + Udacity, in contrast, made an art out of + asking questions during lectures (see for example the + Web Application Engineering + course CS253). + The lecturer there gives multiple-choice questions as part of the lecture and the students need to + click on the appropriate answer. This works very well in the online world. + For “real-world” lectures, the department has some + clickers + (these are little devices which form a part of an audience response systems). However, + they are a logistic nightmare for the lecturer: they need to be distributed + during the lecture and collected at the end. Nowadays, where students + come with their own laptop or smartphone to lectures, this can + be improved. +

    + +

    + The task of this project is to implement an online student + polling system. The lecturer should be able to prepare + questions beforehand (encoded as some web-form) and be able to + show them during the lecture. The students + can give their answers by clicking on the corresponding webpage. + The lecturer can then collect the responses online and evaluate them + immediately. Such a system is sometimes called + HTML voting. + There are a number of commercial + solutions for this problem, but they are not easy to use (in addition + to being ridiculously expensive). A good student can easily improve upon + what they provide. +

    + +

    + The problem of student polling is not as hard as + electronic voting, + which essentially is still an unsolved problem in Computer Science. The + students only need to be prevented from answering question more than once thus skewing + any statistics. Unlike electronic voting, no audit trail needs to be kept + for student polling. Restricting the number of answers can probably be solved + by setting appropriate cookies on the students + computers or smart phones. +

    + +

    + Literature: + The project requires fluency in a web-programming language (for example + JavaScript, + Go, + Scala). However JavaScript with + the Node.js extension seems to be best suited for the job. + Here is a tutorial on Node.js for beginners. + For web-programming the + Web Application Engineering + course at Udacity is a good starting point + to be aware of the issues involved. This course uses Python. + To evaluate the answers from the students, Google's + Chart Tools + might be useful, which is also described in this + youtube video. +

    + +

    + Skills: + In order to provide convenience for the lecturer, this project needs very good web-programming skills. A + hacker mentality + (see above) is probably also very beneficial: web-programming is an area that only emerged recently and + many tools still lack maturity. You probably have to experiment a lot with several different + languages and tools. +

    + +
  • [CU5] Raspberry Pi's and Arduinos

    + +

    + Description: + This project is for true hackers! Raspberry Pi's + are small Linux computers the size of a credit-card and only cost £34 (see picture on the left below). They were introduced + in 2012 and people went crazy...well some of them. There is a + Google+ community about Raspberry Pi's that has more + than 172k of followers. It is hard to keep up with what people do with these small computers. The possibilities + seem to be limitless. The main resource for Raspberry Pi's is here. + There are magazines dedicated to them and tons of + books (not to mention + floods of online material). + Google just released a + framework + for web-programming on Raspberry Pi's truning them into webservers. +

    + +

    + Arduinos are slightly older (from 2005) but still very cool (see picture on the right below). They + are small single-board micro-controllers that can talk to various external gadgets (sensors, motors, etc). Since Arduinos + are open-software and open-hardware there are many clones and add-on boards. Like for the Raspberry Pi, there + is a lot of material available about Arduinos. + The main reference is here. Like the Raspberry Pi's, the good thing about + Arduinos is that they can be powered with simple AA-batteries. +

    + +

    + I have two such Raspberry Pi's including wifi-connectors and two cameras. + I also have two Freakduino Boards that are Arduinos extended with wireless communication. I can lend them to responsible + students for one or two projects. However, the aim is to first come up with an idea for a project. Popular projects are + automated temperature sensors, network servers, robots, web-cams (here + is a web-cam directed at the Shard that can + tell + you whether it is raining or cloudy). There are plenty more ideas listed + here for Raspberry Pi's and + here for Arduinos. +

    + +

    + There are essentially two kinds of projects: One is purely software-based. Software projects for Raspberry Pi's are often + written in Python, but since these are Linux-capable computers any other + language would do as well. You can also write your own operating system as done + here. For example the students + here developed their own bare-metal OS and then implemented + a chess-program on top of it (have a look at their very impressive + youtube video). + The other kind of project is a combination of hardware and software; usually attaching some sensors + or motors to the Raspberry Pi or Arduino. This might require some soldering or what is called + a bread-board. But be careful before choosing a project + involving new hardware: these devices + can be destroyed (if “Vin connected to GND” or “drawing more than 30mA from a GPIO” + does not make sense to you, you should probably stay away from such a project). +

    + +

    +

    + + + +
    +

    + +

    + Skills: + Well, you must be a hacker; happy to make things. Your desk might look like on the photo on the left. + The right photo shows an earlier student project which connects wirelessly a wearable Arduino (packaged + in a "self-3d-printed" watch) to a Raspberry Pi seen in the background. The Arduino takes meaurements of + heart rate and body temperature; the Raspberry Pi collects this data and makes it accessible via a simple + web-service. +

    + + + +
    +

    + +
  • [CU6] Generating Testcases from a Specification

    + +
  • [CU7] GPRS + GPS for Arduinos

    + +
  • [CU8] Language Translator into JavaScript

    + +

    + Description: + JavaScript is a language that is supported by most + browsers and therefore is a favourite + vehicle for Web-programming. Some call it the scripting language of the Web. + Unfortunately, JavaScript is probably one of the worst + languages to program in (being designed and released in a hurry). But it can be used as a convenient target + for translating programs from other languages. In particular there are two + very optimised subsets of JavaScript that can be used for this purpose: + one is asm.js and the other is + emscripten. + There is a tutorial for emscripten + and an impressive demo which runs the + Unreal Engine 3 + in a browser with spectacular speed. This was achieved by compiling the + C-code of the Unreal Engine to the LLVM intermediate language and then translating the LLVM + code to JavaScript. +

    + +

    + Skills: + This project is about exploring these two subsets of JavaScript and implement a translator + of a small language into them. This is similar to the project [CU2] above and requires + similar skills. In addition it would be good to have already some familiarity with JavaScript. + There are plenty of tutorials on the Web. + Here is a list of free books on JavaScript. + This is a project for a student who wants to get more familiar with JavaScript and Web-programming. + A project from which you can draw inspiration is this + List-to-JavaScript + translator. Here is another such project. + And another in less than 100 lines of code. + Coffeescript is a similar project + except that it is already quite mature. And finally not to + forget TypeScript developed by Microsoft. The main + difference between these projects and this one is that they translate into relatively high-level + JavaScript code; none of them use the much lower levels asm.js and + emscripten. +

    + + + +
  • [CU9] An Infrastructure for Displaying and Animating Code in a Web-Browser

    + +

    + Description: + The project aim is to implement an infrastructure for displaying and + animating code in a web-browser. The infrastructure should be agnostic + with respect to the programming language, but should be configurable. + I envisage something smaller than the projects + here (for Python), + here (for Java), + here (for multiple languages), + here (for HTML) + here (for JavaScript), + and here (for Scala). +

    + +

    + The tasks in this project are being able (1) to lex and parse languages and (2) to write an interpreter. + The goal is to implement this as much as possible in a language-agnostic fashion. +

    + +

    + Skills: + Good skill in lexing and language parsing, as well as being fluent with web programming (for + example JavaScript). +

    + + +
  • [CU10] Implementation of a Distributed Clock-Synchronisation Algorithm developed at NASA

    + +

    + Description: + There are many algorithms for synchronising clocks. This + paper + describes a new algorithm for clocks that communicate by exchanging + messages and thereby reach a state in which (within some bound) all clocks are synchronised. + A slightly longer and more detailed paper about the algorithm is + here. + The point of this project is to implement this algorithm and simulate networks of clocks. +

    + +

    + Literature: + There is a wide range of literature on clock synchronisation algorithms. + Some pointers are given in this + paper, + which describes the algorithm to be implemented in this project. Pointers + are given also here. +

    + +

    + Skills: + In order to implement a simulation of a network of clocks, you need to tackle + concurrency. You can do this for example in the programming language + Scala with the help of the + Akka library. This library enables you to send messages + between different actors. Here + are some examples that explain how to implement exchanging messages between actors. +

    + +
  • [CU11] Proving the Correctness of Programs

    + +

    + I am one of the main developers of the interactive theorem prover + Isabelle. This theorem prover + has been used to establish the correctness of some quite large + programs (for example an operating system). + Together with colleagues from Nanjing, I used this theorem prover to establish the correctness of a + scheduling algorithm, called + Priority Inheritance, + for real time operating systems. This scheduling algorithm is part of the operating + system that drives, for example, the + Mars rovers. + Actually, the very first Mars rover mission in 1997 did not have this + algorithm switched on and it almost caused a catastrophic mission failure (see + this youtube video here + for an explanation what happened). + We were able to prove the correctness of this algorithm, but were also able to + establish the correctness of some optimisations in this + paper. +

    + +

    On a much smaller scale, there are a few small programs and underlying algorithms where it + is not really understood whether they always compute a correct result (for example the + regular expression matcher by Sulzmann and Lu in project [CU1]). The aim of this + project is to completely specify an algorithm in Isabelle and then prove it correct (that is, + it always computes the correct result). +

    + +

    + Skills: + This project is for a very good student with a knack for theoretical things and formal reasoning. +

    + +
  • Earlier Projects

    + + I am also open to project suggestions from you. You might find some inspiration from my earlier projects: + BSc 2012/13, + MSc 2012/13, + BSc 2013/14 + MSc 2013/14 +
+
+ +

+ Last modified: Sun Nov 9 21:37:30 GMT 2014 +[Validate this page.] + +