diff -r a6c077ba850a -r 790a40046dc8 projects.html --- a/projects.html Thu Dec 01 23:36:19 2011 +0000 +++ b/projects.html Fri Dec 02 03:28:02 2011 +0000 @@ -31,6 +31,7 @@
Tasks: Understand MiniSat, design and code a SAT-solver in ML, @@ -59,7 +61,7 @@
Literature: A good starting point for reading about SAT-solving is the handbook - article in here. + article here. MiniSat is explained here and here. The standard reference for ML is here (I can lend you my copy @@ -79,7 +81,7 @@ TAL) or an abstract machine. This has been explained in full detail in a PhD-thesis by Louis-Julien Guillemette (available in English here). He used Haskell - as his implementation language. Other choices are of course possible. + as his implementation language. Other choices are possible.
@@ -88,10 +90,10 @@ (parser, intermediate languages, simulator for the idealised assembly language). This project is for a good student with an interest in programming languages, who can also translate abstract ideas into code. If it is too difficult, the project can - easily be scaled back to the + be easily scaled down to the simply-typed lambda calculus (which is simpler than - System F) or only some components of the compiler are implemented. + System F) or to cover only some components of the compiler.
@@ -107,48 +109,152 @@
Description: Given a string, take all its suffixes, and sort them. - This is often also called suffix + This is often called suffix array sorting. It sound simple, but there are some difficulties. - The naive algorithm would generate all (suffix) strings and sort them - using a standard sorting algorithm, for example quick-sort. Unfortunately, - this algorithm is not optimal (it does not take into account that you sort - suffixes) and it also takes an quadratic amount of space, which is a - problem if you have to sort strings of several Mega-Bytes or even Giga-Bytes - (happens often in biotech DNA information.
+ The naive algorithm would generate all suffix strings and sort them + using a standard sorting algorithm, for example + quicksort. + The problem is that + this algorithm is not optimal for suffix sorting: it does not take into account that you sort + suffixes and it also takes a quadratic amount of space. This is a + huge problem if you have to sort strings of several Megabytes or even Gigabytes, + as happens often in biotech and DNA data mining. Suffix sorting is also a crucial operation for the + Burrows-Wheeler transform + on which the data compression algorithm of the popular + bzip2 + program is based. +
- Aim: the notion of index on a text is central in many methods for text - processing and for the management of textual databases. Suffix Arrays is one - of these methods based on the sorted list of suffixes of the input text. The - project consists in implementing a linear-time sorting algorithm and other - elements related to Suffix Array construction and to Burrows-Wheeler text - compression. Plan: study of the sorting problem in the literature starting - with the reference below. Implementation of the sorting algorithm and the - LCP computation to obtain a Suffix Array construction software. Then, using - this work, implementation of the algorithms described in the second - reference below. Deliverables: report, suffix sorting and associated - software and their documentation. ++ There are more efficient algorithms for suffix sorting, for example + here and + here. + However the most space efficient algorithm for suffix sorting + (here) + is horrendously complicated. Your task would be to understand it, and then implement it. +
+ ++ Tasks: + Start by reading the literature about suffix sorting. Then work through the + 12-page paper + explaining the horrendously complicated algorithm and implement it. + Time permitting the work can include an implementation of the Burrows-Wheeler + data compression. This project is for a good student, who likes to study in-depth + algorithms. The project can be carried out in almost all programming languages, + including C, Java, Scala, ML, Haskell and so on. +
+ ++ Literature: A good starting point for reading about suffix sorting is the + book by Crochemore. Two simple algorithms are also described + here. The main literature is the 12-page + article about in-place + suffix sorting. The Burrows-Wheeler data compression is described + here. +
+ ++ Description: + In this project you have to extend the simplifier of the + Isabelle theorem prover. + The simplifier is an important reasoning tool of this theorem prover: it + replaces a term by another term that can be proved to be equal to it. However, + currently the simplifier only rewrites terms according to equalities. + Assuming ≈ is an equivalence relation, the simplifier should also be able + to rewrite terms according to ≈. Since equivalence relations occur + frequently in automated reasoning, this extension would make the simplifier + more powerful and useful. The hope is that your code can go into the + code base of Isabelle. +
+ ++ Tasks: + Read the paper + about rewriting with equivalence relations. Get familiar with parts of the + implementation of Isabelle (I will be of much help as I can). Implement + the extension. This project is suitable for a student with a bit of math background. + It requires knowledge of the functional programming language ML, which + however can be learned quickly provided you have already written code + in another functional programming language. +
- References: - J. Kärkkäinen and P. Sanders, Simple linear work suffix array construction, in ICALP'03, LNCS 2719, Spinger, 2003, pp. 943--955. - M. Crochemore, J. Désarménien and D. Perrin, A note on the Burrows-Wheeler transformation, Theoret. Comput. Sci., 2005, to appear. ++ Literature: A good starting point for reading about rewriting modulo equivalences + is the paper here, + which uses the ACL2 theorem prover. The implementation of the Isabelle theorem + prover is described in much detail in this + programming tutorial. + The standard reference for ML is + here (I can lend you my copy + of this book for the duration of the project). +
- There is a horrendously complicated algorithm for solving these problems. - Your task would be to understand it, and then implement it. + ++ Description: + Lexing and parsing are usually done using automated tools, like + lex and + yacc. The problem + with them is that they "work when they work", but if not, they are + black boxes + which are difficult to debug and change. They are really quite + clumsy, to the point that Might wrote a paper titled + "Yacc is dead".
+ ++ There is simple algorithm for regular expression matching (that is lexing). + This algorithm was introduced by + Brzozowski + in 1964. It is based on the notion of derivatives of regular expressions and + has proved useful + for practical lexing. Last year the notion of derivatives was extended by + Might et al + to context free grammars + and parsing. +
+ ++ Tasks: Get familiar with the two algorithms and implement them. Regular + expression matching is relatively simple; parsing with derivatives is the + harder part. Therefore you should empirically evaluate this part and + tune your implementation. The project can be carried out in almost all programming + languages, including C, Java, Scala, ML, Haskell and so on. +
-+ Literature: This + paper + gives a modern introduction to derivative based lexing. Derivative-based + parsing is explained here + and here. +
+ ++ Description: + Solving the problem of deciding equivalence of regular expressions can be used + to decide a number of problems in automated reasoning. Therefore one likes to + have a method for equivalence checking that is as fast as possible. +
+ ++ Tasks: + The task is to implement the algorithm by Antimirov and Mosses and compare it to + other methods. Hopefully the algorithm can be tuned to be faster than other + methods. +
-+ Literature: + Central to this project is the paper here. + Other methods have been described, for example, + here. +
-Last modified: Thu Dec 1 18:10:37 GMT 2011 +Last modified: Fri Dec 2 03:26:32 GMT 2011 [Validate this page.]