2012/13 MSc Projects

diff -r 7acf8ff8cb0d -r a73de9a29bb5 msc-projects-12.html --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/msc-projects-12.html Tue Nov 06 00:04:58 2012 +0000 @@ -0,0 +1,408 @@ + + + +2012/13 MSc Projects + + + + + + + + + + + + + +

+ +

2012/13 MSc Projects

Supervisor: Christian Urban

Email: christian dot urban at kcl dot ac dot uk, Office: Strand Building S1.27

If you are interested in a project, please send me an email and we can discuss details. Please include +a short description about your programming skills and Computer Science background in your first email. +I will also need your King's username in order to book the project for you. Thanks.

+ +

Note that besides being a lecturer at the theoretical end of Computer Science, I am also a passionate + hacker … + defined as “a person who enjoys exploring the details of programmable systems and + stretching their capabilities, as opposed to most users, who prefer to learn only the minimum + necessary.” I am always happy to supervise like-minded students.

+ +

[CU1] Regular Expression Matching and Partial Derivatives
+ +
+ Description: + Regular expressions + are extremely useful for many text-processing tasks...finding patterns in texts, + lexing programs, syntax highlighting and so on. Given that regular expressions were + introduced in 1950 by Stephen Kleene, you might think + regular expressions have since been studied to death. But you would definitely be mistaken: in fact they are still + an active research area. For example + this paper + about regular expression matching and partial derivatives was presented this summer at the international + PPDP'12 conference.
+ +
The background for this project is that some regular expressions are + "evil" + and can "stab you in the back" according to + this recent blog post. + For example, if you use in Python or + in Ruby (probably also in other mainstream programming languages) the + innocently looking regular expression a?{28}a{28} and match it, say, against the string + aaaaaaaaaaaaaaaaaaaaaaaaaaaa, you will soon notice that your CPU usage goes to 100%. In fact, + Python and Ruby need approximately 30 seconds for matching this string. You can try it for yourself: + re.py (Python version) and + re.rb + (Ruby version). You can imagine an attacker + mounting a nice DoS attack against + your program if it contains such an "evil" regular expression. Actually + Scala (and also Java) are almost immune from such + attacks as they can deal with strings of up to 4,300 as in less than a second. But if you scale + the regular expression and string further to, say, 4,600 as, you get a StackOverflowError + exception chrashing your program. +
+ +
+ On a rainy afternoon, I implemented + this + regular expression matcher in Scala. It is not as fast as the official one in Scala, but + it can match up to 11,000 as in less than 5 seconds without raising any exception + (remember Python and Ruby both need nearly 30 seconds to process 28(!) as, and Scala's + offical matcher maxes out at 4,600 as). My matcher is approximately + 85 lines of code and based on the concept of + derivatives of regular experssions. + Derivatives were introduced in 1964 by + Janusz Brzozowski, but according to this + paper had been lost in the "sands of time". + The advantage of derivatives is that they side-step completely the usual + translations of regular expressions + into NFAs or DFAs, which can introduce the exponential behaviour exhibited by the regular + expression matchers in Python and Ruby. +
+ +
+ Now the guys from the + PPDP'12-paper mentioned + above claim they are even faster than me and can deal with even more features of regular expressions + (for example subexpression matching, which my rainy-afternoon matcher lacks). I am sure they thought + about the problem much longer than a single afternoon. The task + in this project is to find out how good they actually are by implementing the results from their paper. + Their approach is based on the concept of partial derivatives introduced in 1994 by + Valentin Antimirov. + I used them once + in order to prove the Myhill-Nerode theorem + by using only regular expressions. +
+ +
+ Literature: + The place to start with this project is obviously this + paper. + Traditional methods for regular expression matching are explained + in the wikipedia articles + here and + here. + The authoritative book + on automata and regular expressions is by John Hopcroft and Jeffrey Ullmann (available in the library). + There is also an online course about this topic by Ullman at + Coursera, though IMHO not + done with love. + Finally, there are millions of other pointers about regular expression + matching on the Net. Test cases for "evil" + regular expressions can be obtained from here. +
+ +
+ Skills: + This is a project for a student with an interest in theory and some + reasonable programming skills. The project can be easily implemented + in languages like + Scala, + ML, + Haskell, + Python, etc. +
+ + + +
[CU3] Machine Code Generation for a Simple Compiler
+ +
+ Description: + Compilers translate high-level programs that humans can read and write into + efficient machine code that can be run on a CPU or virtual machine. + I recently implemented a very simple compiler for a very simple functional + programming language following this + paper + (also described here). + My code, written in Scala, of this compiler is + here. + The compiler can deal with simple programs involving natural numbers, such + as Fibonacci numbers + or factorial (but it can be easily extended - that is not the point). +
+ +
+ While the hard work has been done (understanding the two papers above), + my compiler only produces some idealised machine code. For example I + assume there are infinitely many registers. The goal of this + project is to generate machine code that is more realistic and can + run on a CPU, like x86, or run on a virtual machine, say the JVM. + This gives probably a speedup of thousand times in comparison to + my naive machine code and virtual machine. The project + requires to dig into the literature about real CPUs and generating + real machine code. +
+ +
+ Literature: + There is a lot of literature about compilers + (for example this book - + I can lend you my copy for the duration of the project). A very good overview article + about implementing compilers by + Laurie Tratt is + here. + An introduction into x86 machine code is here. + Intel's official manual for the x86 instruction is + here. + A simple assembler for the JVM is described here. + An interesting twist of this project is to not generate code for a CPU, but + for the intermediate language of the LLVM compiler + (also described here and + here). If you want to see + what machine code looks like you can compile your C-program using gcc -S. +
+ +
+ Skills: + This is a project for a student with a deep interest in programming languages and + compilers. Since my compiler is implemented in Scala, + it would make sense to continue this project in this language. I can be + of help with questions and books about Scala. + But if Scala is a problem, my code can also be translated quickly into any other functional + language. +
+ +
[CU4] Implementation of Register Spilling Algorithms
+ +
+ Description: + This project is similar to [CU3]. The emphasis here, however, is on the + implementation and comparison of register spilling algorithms, also often called register allocation + algorithms. They are part of any respectable compiler. As said + in [CU3], however, my simple compiler lacks them and assumes an infinite amount of registers instead. + Real CPUs however only provide a fixed amount of registers (for example + x86-64 has 16 general purpose registers). Whenever a program needs + to hold more values than registers, the values need to be “spilled” + into the main memory. Register spilling algorithms try to minimise + this spilling, since fetching values from main memory is a costly + operation. +
+ +
+ The classic algorithm for register spilling uses a + graph-colouring method. + However, for some time the LLVM compiler + used a supposedly more efficient method, called the linear scan allocation method + (described + here). + However, it was later decided to abandon this method in favour of + a + greedy register allocation method. It would be nice if this project can find out + what the issues are with these methods and implement at least one of them for the + simple compiler referenced in [CU3]. +
+ +
+ Literature: + The graph colouring method is described in Andrew Appel's + book on compilers + (I can give you my copy of this book, if it is not available in the library). + There is also a survey + article + about register allocation algorithms with further pointers. +
+ +
+ Skills: + Same skills as [CU3]. +
+ +
[CU5] A Student Polling System
+ +
+ Description: + One of the more annoying aspects of giving a lecture is to ask a question + to the students and no matter how easy the questions is to not + receive an answer. Recently, the online course system + Udacity made an art out of + asking questions during lectures (see for example the + Web Application Engineering + course CS253). + The lecturer there gives multiple-choice questions as part of the lecture and the students need to + click on the appropriate answer. This works very well in the online world. + For “real-world” lectures, the department has some + clickers + (these are little devices part of an audience response systems). However, + they are a logistic nightmare for the lecturer: they need to be distributed + during the lecture and collected at the end. Nowadays, where students + come with their own laptop or smartphone to lectures, this can + be improved. +
+ +
+ The task of this project is to implement an online student + polling system. The lecturer should be able to prepare + questions beforehand (encoded as some web-form) and be able to + show them during the lecture. The students + can give their answers by clicking on the corresponding webpage. + The lecturer can then collect the responses online and evaluate them + immediately. Such a system is sometimes called + HTML voting. + There are a number of commercial + solutions for this problem, but they are not easy to use (in addition + to being ridiculously expensive). A good student can easily improve upon + what they provide. +
+ +
+ The problem of student polling is not as hard as + electronic voting, + which essentially is still an unsolved problem in Computer Science. The + students only need to be prevented from answering question more than once thus skewing + any statistics. Unlike electronic voting, no audit trail needs to be kept + for student polling. Restricting the number of answers can probably be solved + by setting appropriate cookies on the students + computers or smart phones. +
+ +
+ Literature: + The project requires fluency in a web-programming language (for example + Javascript, + PHP, + Java, Python, + Go, + Scala, + Ruby) + and possibly a cloud application platform (for example + Google App Engine or + Heroku). + For web-programming the + Web Application Engineering + course at Udacity is a good starting point + to be aware of the issues involved. This course uses Python. + To evaluate the answers from the student, Google's + Chart Tools + might be useful, which ar also described in this + youtube video. +
+ +
+ Skills: + In order to provide convenience for the lecturer, this project needs very good web-programming skills. A + hacker mentality + (see above) is probably very beneficial: web-programming is an area that only emerged recently and + many tools still lack maturity. You probably have to experiment a lot with several different + languages and tools. +
+ +
[CU6] Implementation of a Distributed Clock-Synchronisation Algorithm developed at NASA
+ +
+ Description: + There are many algorithms for synchronising clocks. This + paper + describes a new algorithm for clocks that communicate by exchanging + messages and thereby reach a state in which (within some bound) all clocks are synchronised. + A slightly longer and more detailed paper about the algorithm is + here. + The point of this project is to implement this algorithm and simulate networks of clocks. +
+ +
+ Literature: + There is a wide range of literature on clock syncronisation algorithms. + Some pointers are given in this + paper, + which describes the algorithm to be implemented in this project. Pointers + are given also here. +
+ +
+ Skills: + In order to implement a simulation of a network of clocks, you need to tackle + concurrency. You can do this for example in the programming language + Scala with the help of the + Akka library. This library enables you to send messages + between different actors. Here + are some examples that explain how to implement exchanging messages between actors. +
+ +

+ +

+ + +Last modified: Wed Sep 12 16:30:03 GMT 2012 + +[Validate this page.] + +

2012/13 MSc Projects

Supervisor: Christian Urban

Email: christian dot urban at kcl dot ac dot uk, Office: Strand Building S1.27

If you are interested in a project, please send me an email and we can discuss details. Please include +a short description about your programming skills and Computer Science background in your first email. +I will also need your King's username in order to book the project for you. Thanks.

[CU1] Regular Expression Matching and Partial Derivatives

[CU3] Machine Code Generation for a Simple Compiler

[CU4] Implementation of Register Spilling Algorithms

[CU5] A Student Polling System

[CU6] Implementation of a Distributed Clock-Synchronisation Algorithm developed at NASA