2011/12 MSc Individual Projects

Supervisor: Christian Urban

Email: christian dot urban at kcl dot ac dot uk, Office: Strand Building S6.30

If you are interested in a project, please send me an email and we can discuss details. Please include a short description about your programming skills and computer science background in your first email. I will also need your King's username in order to book the project for you. Thanks.

  • [CU1] Implementing a SAT-Solver in a Functional Programming Language

    Description: SAT-solver search for satisfying assignments of boolean formulas. Although this is a computationally hard problem (NP-complete), modern SAT-solvers routinely solve boolean formulas with 100,000 and more variables. Application areas of SAT-solver are manifold: they range from hardware verification to Sudoku solvers. Every 2 years there is a competition of the best SAT-solvers in the world.

    Most SAT-solvers are written in C. The aim of this project is to design and implement a SAT-solver in a functional programming language (preferably ML, but Haskell, Scala, OCaml, ... are also OK). Starting point is the open source SAT-solver MiniSat (available here). The long-term hope is that your implementation becomes part of the interactive theorem prover Isabelle. For this the SAT-solver needs to be implemented in ML.

    Tasks: Understand MiniSat, design and code a SAT-solver in ML, empirical evaluation and tuning of your code.

    Literature: A good starting point for reading about SAT-solving is the handbook article here. MiniSat is explained here and here. The standard reference for ML is here (I can lend you my copy of this book for the duration of the project). The best free implementation of ML is PolyML.

  • [CU2] A Compiler for System F

    Description: System F is a mini programming language, which is often used to study the theory behind programming languages, but is also used as a core-language of functional programming languages (for example Haskell). The language is small enough to implement in a reasonable amount of time a compiler to an idealised assembly language (preferably TAL) or an abstract machine. This has been explained in full detail in a PhD-thesis by Louis-Julien Guillemette (available in English here). He used Haskell as his implementation language. Other choices are possible.

    Tasks: Read the relevant literature and implement the various components of a compiler (parser, intermediate languages, simulator for the idealised assembly language). This project is for a good student with an interest in programming languages, who can also translate abstract ideas into code. If it is too difficult, the project can be easily scaled down to the simply-typed lambda calculus (which is simpler than System F) or to cover only some components of the compiler.

    Literature: The PhD-thesis by Louis-Julien Guillemette is required reading. A shorter paper about this subject is available here. A good starting point for TAL is here. There is a lot of literature about compilers (for example this book - I can lend you my copy for the duration of the project). A very good overview article about implementing compilers by Laurie Tratt is here.

  • [CU3] Sorting Suffixes

    Description: Given a string, take all its suffixes, and sort them. This is often called suffix array sorting. It sound simple, but there are some difficulties. The naive algorithm would generate all suffix strings and sort them using a standard sorting algorithm, for example quicksort. The problem is that this algorithm is not optimal for suffix sorting: it does not take into account that you sort suffixes and it also takes a quadratic amount of space. This is a huge problem if you have to sort strings of several Megabytes or even Gigabytes, as happens often in biotech and DNA data mining. Suffix sorting is also a crucial operation for the Burrows-Wheeler transform on which the data compression algorithm of the popular bzip2 program is based.

    There are more efficient algorithms for suffix sorting, for example here and here. However the most space efficient algorithm for suffix sorting (here) is horrendously complicated. Your task would be to understand it, and then implement it.

    Tasks: Start by reading the literature about suffix sorting. Then work through the 12-page paper explaining the horrendously complicated algorithm and implement it. Time permitting the work can include an implementation of the Burrows-Wheeler data compression. This project is for a good student, who likes to study in-depth algorithms. The project can be carried out in almost all programming languages, including C, Java, Scala, ML, Haskell and so on.

    Literature: A good starting point for reading about suffix sorting is the book by Crochemore. Another good introduction is here, which gives also good pointers for why efficient suffix sorting is practically relevant. Two simple algorithms are described here. The main literature is the 12-page article about in-place suffix sorting. The Burrows-Wheeler data compression is described here.

  • [CU4] Simplification with Equivalence Relations in the Isabelle Theorem Prover

    Description: In this project you have to extend the simplifier of the Isabelle theorem prover. The simplifier is an important reasoning tool of this theorem prover: it replaces a term by another term that can be proved to be equal to it. However, currently the simplifier only rewrites terms according to equalities. Assuming ≈ is an equivalence relation, the simplifier should also be able to rewrite terms according to ≈. Since equivalence relations occur frequently in automated reasoning, this extension would make the simplifier more powerful and useful. The hope is that your code can go into the code base of Isabelle.

    Tasks: Read the paper about rewriting with equivalence relations. Get familiar with parts of the implementation of Isabelle (I will be of much help as I can). Implement the extension. This project is suitable for a student with a bit of math background. It requires knowledge of the functional programming language ML, which however can be learned quickly provided you have already written code in another functional programming language.

    Literature: A good starting point for reading about rewriting modulo equivalences is the paper here, which uses the ACL2 theorem prover. The implementation of the Isabelle theorem prover is described in much detail in this programming tutorial. The standard reference for ML is here (I can lend you my copy of this book for the duration of the project).

  • [CU5] Lexing and Parsing with Derivatives

    Description: Lexing and parsing are usually done using automated tools, like lex and yacc. The problem with them is that they "work when they work", but if they do not, then they are black boxes which are difficult to debug and change. They are really quite clumsy to the point that Might and Darais wrote a paper titled "Yacc is dead".

    There is a simple algorithm for regular expression matching (that is lexing). This algorithm was introduced by Brzozowski in 1964. It is based on the notion of derivatives of regular expressions and has proved useful for practical lexing. Last year the notion of derivatives was extended by Might et al to context free grammars and parsing.

    Tasks: Get familiar with the two algorithms and implement them. Regular expression matching is relatively simple; parsing with derivatives is the harder part. Therefore you should empirically evaluate this part and tune your implementation. The project can be carried out in almost all programming languages, including C, Java, Scala, ML, Haskell and so on.

    Literature: This paper gives a modern introduction to derivative based lexing. Derivative-based parsing is explained here and here.

  • [CU6] Equivalence Checking of Regular Expressions using the Method by Antimirov and Mosses

    Description: Solving the problem of deciding equivalence of regular expressions can be used to decide a number of problems in automated reasoning. Therefore one likes to have a method for equivalence checking that is as fast as possible. There have been a number of algorithms proposed in the past, but one based on a method by Antimirov and Mosses seems relatively simple and easy to implement.

    Tasks: The task is to implement the algorithm by Antimirov and Mosses and compare it to other methods. Hopefully the algorithm can be tuned to be faster than other methods. The project can be carried out in almost all programming languages, but as usual functional programming languages such Scala, ML, Haskell have an edge for this kind of problems.

    Literature: Central to this project are the papers here and here. Other methods have been described, for example, here. A relatively complicated method, based on automata, is described here.

  • [CU7] Game-Playing Engine for Five-In-A-Row on a Large Board

    Literature: There is a web-page with various pointers to computer players here. There are also some good books about computer players, for example:
    Computer Game-Playing: Theory and Practice by M. Bramer, Ellis Horwood Ltd, 1983. (Considers techniques used for programming a variety of games: Chess, Go, Scrabble, Billiards, Othello, etc; includes theoretical issues such as game searching)
    Chips Challenging Champions: Games, Computers and Artificial Intelligence by J. Schaeffer and H.J. van den Herik, North Holland, 2002.
    Artificial Intelligence for Games by I. Millington and J. Funge, Morgan Kaufmann, 2009.
    Computer Gamesmanship: The Complete Guide to Creating and Structuring Intelligent Games Programs by D.N.L. Levy, Simon and Schuster, 1983.

Last modified: Wed Jan 11 16:30:03 GMT 2012 [Validate this page.]