-
[CU1] Implementing a SAT-Solver in a Functional Programming Language
Description:
SAT-solver search for satisfying assignments of boolean formulas. Although this
is a computationally hard problem (NP-complete),
modern SAT-solvers routinely solve boolean formulas with 100,000 and more variables.
Application areas of SAT-solver are manifold: they range from hardware verification to
Sudoku solvers. Every 2 years there is a competition of the best SAT-solvers in the world.
Most SAT-solvers are written in C. The aim of this project is to design and implement
a SAT-solver in a functional programming language (preferably
ML, but
Haskell,
Scala,
OCaml, ... are also OK). Starting point is
the open source SAT-solver MiniSat (available here).
The long-term hope is that your implementation becomes part of the interactive theorem prover
Isabelle. For this
the SAT-solver needs to be implemented in ML.
Tasks: Understand MiniSat, design and code a SAT-solver in ML,
empirical evaluation and tuning of your code.
Literature: A good starting point for reading about SAT-solving is the handbook
article here.
MiniSat is explained here and
here. The standard reference for ML is
here (I can lend you my copy
of this book for the duration of the project). The best free implementation of ML is
PolyML.
-
[CU2] A Compiler for System F
Description:
System F is a mini programming language,
which is often used to study the theory behind programming languages, but is also used as
a core-language of functional programming languages (for example
Haskell). The language is small
enough to implement in a reasonable amount of time a compiler to an
idealised assembly language (preferably
TAL) or an abstract machine.
This has been explained in full detail in a PhD-thesis by Louis-Julien Guillemette
(available in English here). He used Haskell
as his implementation language. Other choices are possible.
Tasks:
Read the relevant literature and implement the various components of a compiler
(parser, intermediate languages, simulator for the idealised assembly language).
This project is for a good student with an interest in programming languages,
who can also translate abstract ideas into code. If it is too difficult, the project can
be easily scaled down to the
simply-typed
lambda calculus (which is simpler than
System F) or to cover only some components of the compiler.
Literature:
The PhD-thesis by Louis-Julien Guillemette is required reading. A shorter
paper about this subject is available here.
A good starting point for TAL is here.
There is a lot of literature about compilers
(for example this book -
I can lend you my copy for the duration of the project).
-
[CU3] Sorting Suffixes
Description: Given a string, take all its suffixes, and sort them.
This is often called suffix
array sorting. It sound simple, but there are some difficulties.
The naive algorithm would generate all suffix strings and sort them
using a standard sorting algorithm, for example
quicksort.
The problem is that
this algorithm is not optimal for suffix sorting: it does not take into account that you sort
suffixes and it also takes a quadratic amount of space. This is a
huge problem if you have to sort strings of several Megabytes or even Gigabytes,
as happens often in biotech and DNA data mining. Suffix sorting is also a crucial operation for the
Burrows-Wheeler transform
on which the data compression algorithm of the popular
bzip2
program is based.
There are more efficient algorithms for suffix sorting, for example
here and
here.
However the most space efficient algorithm for suffix sorting
(here)
is horrendously complicated. Your task would be to understand it, and then implement it.
Tasks:
Start by reading the literature about suffix sorting. Then work through the
12-page paper
explaining the horrendously complicated algorithm and implement it.
Time permitting the work can include an implementation of the Burrows-Wheeler
data compression. This project is for a good student, who likes to study in-depth
algorithms. The project can be carried out in almost all programming languages,
including C, Java, Scala, ML, Haskell and so on.
Literature: A good starting point for reading about suffix sorting is the
book by Crochemore. Two simple algorithms are also described
here. The main literature is the 12-page
article about in-place
suffix sorting. The Burrows-Wheeler data compression is described
here.
-
[CU4] Simplification with Equivalence Relations in the Isabelle Theorem Prover
Description:
In this project you have to extend the simplifier of the
Isabelle theorem prover.
The simplifier is an important reasoning tool of this theorem prover: it
replaces a term by another term that can be proved to be equal to it. However,
currently the simplifier only rewrites terms according to equalities.
Assuming ≈ is an equivalence relation, the simplifier should also be able
to rewrite terms according to ≈. Since equivalence relations occur
frequently in automated reasoning, this extension would make the simplifier
more powerful and useful. The hope is that your code can go into the
code base of Isabelle.
Tasks:
Read the paper
about rewriting with equivalence relations. Get familiar with parts of the
implementation of Isabelle (I will be of much help as I can). Implement
the extension. This project is suitable for a student with a bit of math background.
It requires knowledge of the functional programming language ML, which
however can be learned quickly provided you have already written code
in another functional programming language.
Literature: A good starting point for reading about rewriting modulo equivalences
is the paper here,
which uses the ACL2 theorem prover. The implementation of the Isabelle theorem
prover is described in much detail in this
programming tutorial.
The standard reference for ML is
here (I can lend you my copy
of this book for the duration of the project).
[CU5] Lexing and Parsing with Derivatives
Description:
Lexing and parsing are usually done using automated tools, like
lex and
yacc. The problem
with them is that they "work when they work", but if they do not, then they are
black boxes
which are difficult to debug and change. They are really quite
clumsy to the point that Might and Darais wrote a paper titled
"Yacc is dead".
There is a simple algorithm for regular expression matching (that is lexing).
This algorithm was introduced by
Brzozowski
in 1964. It is based on the notion of derivatives of regular expressions and
has proved useful
for practical lexing. Last year the notion of derivatives was extended by
Might et al
to context free grammars
and parsing.
Tasks: Get familiar with the two algorithms and implement them. Regular
expression matching is relatively simple; parsing with derivatives is the
harder part. Therefore you should empirically evaluate this part and
tune your implementation. The project can be carried out in almost all programming
languages, including C, Java, Scala, ML, Haskell and so on.
Literature: This
paper
gives a modern introduction to derivative based lexing. Derivative-based
parsing is explained here
and here.
-
[CU6] Equivalence Checking of Regular Expressions using the Method by Antimirov and Mosses
Description:
Solving the problem of deciding equivalence of regular expressions can be used
to decide a number of problems in automated reasoning. Therefore one likes to
have a method for equivalence checking that is as fast as possible. There have
been a number of algorithms proposed in the past, but one based on a method
by Antimirov and Mosses seems relatively simple and easy to implement.
Tasks:
The task is to implement the algorithm by Antimirov and Mosses and compare it to
other methods. Hopefully the algorithm can be tuned to be faster than other
methods. The project can be carried out in almost all programming languages, but
as usual functional programming languages such Scala, ML, Haskell have an edge
for this kind of problems.
Literature:
Central to this project is the paper here.
Other methods have been described, for example,
here.
A relatively complicated method, based on automata, is described
here.