<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HEAD>
<TITLE>Christian Urban</TITLE>
<BASE HREF="http://www.inf.kcl.ac.uk/staff/urbanc/">
<script type="text/javascript" src="striper.js"></script>
<link rel="stylesheet" href="nominal.css">
</HEAD>
<BODY TEXT="#000000"
BGCOLOR="#4169E1"
LINK="#0000EF"
VLINK="#51188E"
ALINK="#FF0000"
ONLOAD="striper('ul','striped','li','first,second')">
<TABLE WIDTH="100%"
BGCOLOR="#4169E1"
BORDER="0"
FRAME="border"
CELLPADDING="10"
CELLSPACING="2"
RULES="all">
<TR>
<TD BGCOLOR="#FFFFFF"
WIDTH="75%"
VALIGN="TOP">
<H2>2011/12 MSc Individual Projects</H2>
<H4>Supervisor: Christian Urban</H4>
<H4>Email: christian dot urban at kcl dot ac dot uk, Office: Strand Building S6.30</H4>
<H4>If you are interested in a project, please send me an email and we can discuss details. Please include
a short description about your programming and computer science background in your first email. Thanks.</H4>
<ul class="striped">
<li> <H4>[CU1] Implementing a SAT-Solver in a Functional Programming Language</H4>
<p><B>Description:</b>
SAT-solver search for satisfying assignments of boolean formulas. Although this
is a computationally hard problem (<A HREF="http://en.wikipedia.org/wiki/NP-complete">NP-complete</A>),
modern SAT-solvers routinely solve boolean formulas with 100,000 and more variables.
Application areas of SAT-solver are manifold: they range from hardware verification to
Sudoku solvers. Every 2 years there is a competition of the best SAT-solvers in the world.</p>
<p>
Most SAT-solvers are written in C. The aim of this project is to design and implement
a SAT-solver in a functional programming language (preferably
<A HREF="http://en.wikipedia.org/wiki/Standard_ML">ML</A>, but
<A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>,
<A HREF="http://www.scala-lang.org/">Scala</A>,
<A HREF="http://caml.inria.fr/">OCaml</A>, ... are also OK). Starting point is
the open source SAT-solver MiniSat (available <A HREF="http://minisat.se/Main.html">here</A>).
The long-term hope is that your implementation becomes part of the interactive theorem prover
<A HREF="http://www.cl.cam.ac.uk/research/hvg/isabelle/">Isabelle</A>. For this
the SAT-solver needs to be implemented in ML.</p>
<p>
<B>Tasks:</B> Understand MiniSat, design and code a SAT-solver in ML,
empirical evaluation and tuning of your code.</p>
<p>
<B>Literature:</B> A good starting point for reading about SAT-solving is the handbook
article <A HREF="http://www.cs.cornell.edu/gomes/papers/SATSolvers-KR-Handbook.pdf">here</A>.
MiniSat is explained <A HREF="http://minisat.se/downloads/MiniSat.pdf">here</A> and
<A HREF="http://minisat.se/Papers.html">here</A>. The standard reference for ML is
<A HREF="http://www.cl.cam.ac.uk/~lp15/MLbook/">here</A> (I can lend you my copy
of this book for the duration of the project). The best free implementation of ML is
<A HREF="http://www.polyml.org/">PolyML</A>.
</p>
<li> <H4>[CU2] A Compiler for System F</H4>
<p><b>Description:</b>
<A HREF="http://en.wikipedia.org/wiki/System_F">System F</A> is a mini programming language,
which is often used to study the theory behind programming languages, but is also used as
a core-language of functional programming languages (for example
<A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>). The language is small
enough to implement in a reasonable amount of time a compiler to an
idealised assembly language (preferably
<A HREF="http://en.wikipedia.org/wiki/Typed_assembly_language">TAL</A>) or an abstract machine.
This has been explained in full detail in a PhD-thesis by Louis-Julien Guillemette
(available in English <A HREF="https://papyrus.bib.umontreal.ca/jspui/bitstream/1866/3454/6/Guillemette_Louis-Julien_2009_these.pdf">here</A>). He used <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>
as his implementation language. Other choices are possible.
</p>
<p>
<b>Tasks:</b>
Read the relevant literature and implement the various components of a compiler
(parser, intermediate languages, simulator for the idealised assembly language).
This project is for a good student with an interest in programming languages,
who can also translate abstract ideas into code. If it is too difficult, the project can
be easily scaled down to the
<A HREF="http://en.wikipedia.org/wiki/Simply_typed_lambda_calculus">simply-typed
lambda calculus</A> (which is simpler than
System F) or to cover only some components of the compiler.
</p>
<p>
<B>Literature:</B>
The <A HREF="https://papyrus.bib.umontreal.ca/jspui/bitstream/1866/3454/6/Guillemette_Louis-Julien_2009_these.pdf">PhD-thesis</A> by Louis-Julien Guillemette is required reading. A shorter
paper about this subject is available <A HREF="http://www.iro.umontreal.ca/~monnier/icfp08.pdf">here</A>.
A good starting point for TAL is <A HREF="http://www.cs.cornell.edu/talc/papers/tal-tr.pdf">here</A>.
There is a lot of literature about compilers
(for example <A HREF="http://www.cs.princeton.edu/~appel/papers/cwc.html">this book</A> -
I can lend you my copy for the duration of the project).
</p>
<li> <H4>[CU3] Sorting Suffixes</H4>
<p><b>Description:</b> Given a string, take all its suffixes, and sort them.
This is often called <A HREF="http://en.wikipedia.org/wiki/Suffix_array">suffix
array sorting</A>. It sound simple, but there are some difficulties.
The naive algorithm would generate all suffix strings and sort them
using a standard sorting algorithm, for example
<A HREF="http://en.wikipedia.org/wiki/Quicksort">quicksort</A>.
The problem is that
this algorithm is not optimal for suffix sorting: it does not take into account that you sort
suffixes and it also takes a quadratic amount of space. This is a
huge problem if you have to sort strings of several Megabytes or even Gigabytes,
as happens often in biotech and DNA data mining. Suffix sorting is also a crucial operation for the
<A HREF="http://en.wikipedia.org/wiki/Burrows?Wheeler_transform">Burrows-Wheeler transform</A>
on which the data compression algorithm of the popular
<A HREF="http://en.wikipedia.org/wiki/Bzip2">bzip2</A>
program is based.
</p>
<p>
There are more efficient algorithms for suffix sorting, for example
<A HREF="http://books.google.co.uk/books?id=Pn1sHToYf9oC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false">here</A> and
<A HREF="http://ls11-www.cs.uni-dortmund.de/people/rahmann/teaching/ss2008/AlgorithmenAufSequenzen/09-walk-bwt.pdf">here</A>.
However the most space efficient algorithm for suffix sorting
(<A HREF="http://www.cs.rutgers.edu/~muthu/fm072.pdf">here</A>)
is horrendously complicated. Your task would be to understand it, and then implement it.
</p>
<p>
<B>Tasks:</B>
Start by reading the literature about suffix sorting. Then work through the
12-page <A HREF="http://www.cs.rutgers.edu/~muthu/fm072.pdf">paper</A>
explaining the horrendously complicated algorithm and implement it.
Time permitting the work can include an implementation of the Burrows-Wheeler
data compression. This project is for a good student, who likes to study in-depth
algorithms. The project can be carried out in almost all programming languages,
including C, Java, Scala, ML, Haskell and so on.
</p>
<p>
<B>Literature:</B> A good starting point for reading about suffix sorting is the
<A HREF="http://books.google.co.uk/books?id=Pn1sHToYf9oC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false">book</A> by Crochemore. Two simple algorithms are also described
<A HREF="http://ls11-www.cs.uni-dortmund.de/people/rahmann/teaching/ss2008/AlgorithmenAufSequenzen/09-walk-bwt.pdf">here</A>. The main literature is the 12-page
<A HREF="http://www.cs.rutgers.edu/~muthu/fm072.pdf">article</A> about in-place
suffix sorting. The Burrows-Wheeler data compression is described
<A HREF="http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-124.pdf">here</A>.
</p>
<li> <H4>[CU4] Simplification with Equivalence Relations in the Isabelle Theorem Prover</H4>
<p>
<B>Description:</B>
In this project you have to extend the simplifier of the
<A HREF="http://isabelle.in.tum.de/">Isabelle theorem prover</A>.
The simplifier is an important reasoning tool of this theorem prover: it
replaces a term by another term that can be proved to be equal to it. However,
currently the simplifier only rewrites terms according to equalities.
Assuming ≈ is an equivalence relation, the simplifier should also be able
to rewrite terms according to ≈. Since equivalence relations occur
frequently in automated reasoning, this extension would make the simplifier
more powerful and useful. The hope is that your code can go into the
code base of Isabelle.
</p>
<p>
<B>Tasks:</B>
Read the <A HREF="http://www.springerlink.com/content/x7041m1807738832/">paper</A>
about rewriting with equivalence relations. Get familiar with parts of the
implementation of Isabelle (I will be of much help as I can). Implement
the extension. This project is suitable for a student with a bit of math background.
It requires knowledge of the functional programming language ML, which
however can be learned quickly provided you have already written code
in another functional programming language.
</p>
<p>
<B>Literature:</B> A good starting point for reading about rewriting modulo equivalences
is the paper <A HREF="http://www.springerlink.com/content/x7041m1807738832/">here</A>,
which uses the ACL2 theorem prover. The implementation of the Isabelle theorem
prover is described in much detail in this
<A HREF="http://www.inf.kcl.ac.uk/staff/urbanc/Cookbook/">programming tutorial</A>.
The standard reference for ML is
<A HREF="http://www.cl.cam.ac.uk/~lp15/MLbook/">here</A> (I can lend you my copy
of this book for the duration of the project).
</p>
<li><h4>[CU5] Lexing and Parsing with Derivatives</h4>
<p>
<B>Description:</B>
Lexing and parsing are usually done using automated tools, like
<A HREF="http://en.wikipedia.org/wiki/Lex_programming_tool">lex</A> and
<A HREF="http://en.wikipedia.org/wiki/Yacc">yacc</A>. The problem
with them is that they "work when they work", but if they do not, then they are
<A HREF="http://en.wikipedia.org/wiki/Black_box">black boxes</A>
which are difficult to debug and change. They are really quite
clumsy to the point that Might and Darais wrote a paper titled
"<A HREF="http://arxiv.org/pdf/1010.5023v1">Yacc is dead</A>".</p>
<p>
There is a simple algorithm for regular expression matching (that is lexing).
This algorithm was introduced by
<A HREF="http://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)">Brzozowski</A>
in 1964. It is based on the notion of derivatives of regular expressions and
has proved <A HREF="http://www.cl.cam.ac.uk/~so294/documents/jfp09.pdf">useful</A>
for practical lexing. Last year the notion of derivatives was extended by
<A HREF="http://matt.might.net/papers/might2011derivatives.pdf">Might et al</A>
to <A HREF="http://en.wikipedia.org/wiki/Context-free_grammar">context free grammars</A>
and parsing.
</p>
<p>
<B>Tasks:</B> Get familiar with the two algorithms and implement them. Regular
expression matching is relatively simple; parsing with derivatives is the
harder part. Therefore you should empirically evaluate this part and
tune your implementation. The project can be carried out in almost all programming
languages, including C, Java, Scala, ML, Haskell and so on.
</p>
<p>
<B>Literature:</B> This
<A HREF="http://www.cl.cam.ac.uk/~so294/documents/jfp09.pdf">paper</A>
gives a modern introduction to derivative based lexing. Derivative-based
parsing is explained <A HREF="http://arxiv.org/pdf/1010.5023v1">here</A>
and <A HREF="http://matt.might.net/papers/might2011derivatives.pdf">here</A>.
</p>
<li> <H4>[CU6] Equivalence Checking of Regular Expressions using the Method by Antimirov and Mosses</H4>
<p>
<B>Description:</B>
Solving the problem of deciding equivalence of regular expressions can be used
to decide a number of problems in automated reasoning. Therefore one likes to
have a method for equivalence checking that is as fast as possible. There have
been a number of algorithms proposed in the past, but one based on a method
by Antimirov and Mosses seems relatively simple and easy to implement.
</p>
<p>
<B>Tasks:</B>
The task is to implement the algorithm by Antimirov and Mosses and compare it to
other methods. Hopefully the algorithm can be tuned to be faster than other
methods. The project can be carried out in almost all programming languages, but
as usual functional programming languages such Scala, ML, Haskell have an edge
for this kind of problems.
</p>
<p>
<B>Literature:</B>
Central to this project is the paper <A HREF="http://www.dcc.fc.up.pt/~nam/publica/ijcs08.pdf">here</A>.
Other methods have been described, for example,
<A HREF="http://www4.informatik.tu-muenchen.de/~krauss/papers/rexp.pdf">here</A>.
A relatively complicated method, based on automata, is described
<A HREF="http://sardes.inrialpes.fr/~braibant/atbr/">here</A>.
</p>
</ul>
</TD>
</TR>
</TABLE>
<P><!-- Created: Tue Mar 4 00:23:25 GMT 1997 -->
<!-- hhmts start -->
Last modified: Tue Dec 6 08:41:27 GMT 2011
<!-- hhmts end -->
<a href="http://validator.w3.org/check/referer">[Validate this page.]</a>
</BODY>
</HTML>