--- a/projects.html Thu Dec 01 23:36:19 2011 +0000
+++ b/projects.html Fri Dec 02 03:28:02 2011 +0000
@@ -31,6 +31,7 @@
<H2>2011/12 MSc Individual Projects</H2>
<H4>Supervisor: Christian Urban</H4>
<H4>Email: @kcl Office: Strand Building S6.30</H4>
+<H4>If you are interested in a project, please send me email and we can discuss details.</H4>
<ul class="striped">
<li> <H4>[CU1] Implementing a SAT-Solver in a Functional Programming Language</H4>
@@ -51,7 +52,8 @@
<A HREF="http://caml.inria.fr/">OCaml</A>, ... are also OK). Starting point is
the open source SAT-solver MiniSat (available <A HREF="http://minisat.se/Main.html">here</A>).
The long-term hope is that your implementation becomes part of the interactive theorem prover
- <A HREF="http://www.cl.cam.ac.uk/research/hvg/isabelle/">Isabelle</A>.</p>
+ <A HREF="http://www.cl.cam.ac.uk/research/hvg/isabelle/">Isabelle</A>. For this
+ the SAT-solver needs to be implemented in ML.</p>
<p>
<B>Tasks:</B> Understand MiniSat, design and code a SAT-solver in ML,
@@ -59,7 +61,7 @@
<p>
<B>Literature:</B> A good starting point for reading about SAT-solving is the handbook
- article in <A HREF="http://www.cs.cornell.edu/gomes/papers/SATSolvers-KR-Handbook.pdf">here</A>.
+ article <A HREF="http://www.cs.cornell.edu/gomes/papers/SATSolvers-KR-Handbook.pdf">here</A>.
MiniSat is explained <A HREF="http://minisat.se/downloads/MiniSat.pdf">here</A> and
<A HREF="http://minisat.se/Papers.html">here</A>. The standard reference for ML is
<A HREF="http://www.cl.cam.ac.uk/~lp15/MLbook/">here</A> (I can lend you my copy
@@ -79,7 +81,7 @@
<A HREF="http://en.wikipedia.org/wiki/Typed_assembly_language">TAL</A>) or an abstract machine.
This has been explained in full detail in a PhD-thesis by Louis-Julien Guillemette
(available in English <A HREF="https://papyrus.bib.umontreal.ca/jspui/bitstream/1866/3454/6/Guillemette_Louis-Julien_2009_these.pdf">here</A>). He used <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>
- as his implementation language. Other choices are of course possible.
+ as his implementation language. Other choices are possible.
</p>
<p>
@@ -88,10 +90,10 @@
(parser, intermediate languages, simulator for the idealised assembly language).
This project is for a good student with an interest in programming languages,
who can also translate abstract ideas into code. If it is too difficult, the project can
- easily be scaled back to the
+ be easily scaled down to the
<A HREF="http://en.wikipedia.org/wiki/Simply_typed_lambda_calculus">simply-typed
lambda calculus</A> (which is simpler than
- System F) or only some components of the compiler are implemented.
+ System F) or to cover only some components of the compiler.
</p>
<p>
@@ -107,48 +109,152 @@
<li> <H4>[CU3] Sorting Suffixes</H4>
<p><b>Description:</b> Given a string, take all its suffixes, and sort them.
- This is often also called <A HREF="http://en.wikipedia.org/wiki/Suffix_array">suffix
+ This is often called <A HREF="http://en.wikipedia.org/wiki/Suffix_array">suffix
array sorting</A>. It sound simple, but there are some difficulties.
- The naive algorithm would generate all (suffix) strings and sort them
- using a standard sorting algorithm, for example quick-sort. Unfortunately,
- this algorithm is not optimal (it does not take into account that you sort
- suffixes) and it also takes an quadratic amount of space, which is a
- problem if you have to sort strings of several Mega-Bytes or even Giga-Bytes
- (happens often in biotech DNA information.<p>
+ The naive algorithm would generate all suffix strings and sort them
+ using a standard sorting algorithm, for example
+ <A HREF="http://en.wikipedia.org/wiki/Quicksort">quicksort</A>.
+ The problem is that
+ this algorithm is not optimal for suffix sorting: it does not take into account that you sort
+ suffixes and it also takes a quadratic amount of space. This is a
+ huge problem if you have to sort strings of several Megabytes or even Gigabytes,
+ as happens often in biotech and DNA data mining. Suffix sorting is also a crucial operation for the
+ <A HREF="http://en.wikipedia.org/wiki/Burrows?Wheeler_transform">Burrows-Wheeler transform</A>
+ on which the data compression algorithm of the popular
+ <A HREF="http://en.wikipedia.org/wiki/Bzip2">bzip2</A>
+ program is based.
+ </p>
- Aim: the notion of index on a text is central in many methods for text
- processing and for the management of textual databases. Suffix Arrays is one
- of these methods based on the sorted list of suffixes of the input text. The
- project consists in implementing a linear-time sorting algorithm and other
- elements related to Suffix Array construction and to Burrows-Wheeler text
- compression. Plan: study of the sorting problem in the literature starting
- with the reference below. Implementation of the sorting algorithm and the
- LCP computation to obtain a Suffix Array construction software. Then, using
- this work, implementation of the algorithms described in the second
- reference below. Deliverables: report, suffix sorting and associated
- software and their documentation.
+ <p>
+ There are more efficient algorithms for suffix sorting, for example
+ <A HREF="http://books.google.co.uk/books?id=Pn1sHToYf9oC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false">here</A> and
+ <A HREF="http://ls11-www.cs.uni-dortmund.de/people/rahmann/teaching/ss2008/AlgorithmenAufSequenzen/09-walk-bwt.pdf">here</A>.
+ However the most space efficient algorithm for suffix sorting
+ (<A HREF="http://www.cs.rutgers.edu/~muthu/fm072.pdf">here</A>)
+ is horrendously complicated. Your task would be to understand it, and then implement it.
+ </p>
+
+ <p>
+ <B>Tasks:</B>
+ Start by reading the literature about suffix sorting. Then work through the
+ 12-page <A HREF="http://www.cs.rutgers.edu/~muthu/fm072.pdf">paper</A>
+ explaining the horrendously complicated algorithm and implement it.
+ Time permitting the work can include an implementation of the Burrows-Wheeler
+ data compression. This project is for a good student, who likes to study in-depth
+ algorithms. The project can be carried out in almost all programming languages,
+ including C, Java, Scala, ML, Haskell and so on.
+ </p>
+
+ <p>
+ <B>Literature:</B> A good starting point for reading about suffix sorting is the
+ <A HREF="http://books.google.co.uk/books?id=Pn1sHToYf9oC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false">book</A> by Crochemore. Two simple algorithms are also described
+ <A HREF="http://ls11-www.cs.uni-dortmund.de/people/rahmann/teaching/ss2008/AlgorithmenAufSequenzen/09-walk-bwt.pdf">here</A>. The main literature is the 12-page
+ <A HREF="http://www.cs.rutgers.edu/~muthu/fm072.pdf">article</A> about in-place
+ suffix sorting. The Burrows-Wheeler data compression is described
+ <A HREF="http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-124.pdf">here</A>.
+ </p>
+
+<li> <H4>[CU4] Simplification with Equivalence Relations in the Isabelle Theorem Prover</H4>
+ <p>
+ <B>Description:</B>
+ In this project you have to extend the simplifier of the
+ <A HREF="http://isabelle.in.tum.de/">Isabelle theorem prover</A>.
+ The simplifier is an important reasoning tool of this theorem prover: it
+ replaces a term by another term that can be proved to be equal to it. However,
+ currently the simplifier only rewrites terms according to equalities.
+ Assuming ≈ is an equivalence relation, the simplifier should also be able
+ to rewrite terms according to ≈. Since equivalence relations occur
+ frequently in automated reasoning, this extension would make the simplifier
+ more powerful and useful. The hope is that your code can go into the
+ code base of Isabelle.
+ </p>
+
+ <p>
+ <B>Tasks:</B>
+ Read the <A HREF="http://www.springerlink.com/content/x7041m1807738832/">paper</A>
+ about rewriting with equivalence relations. Get familiar with parts of the
+ implementation of Isabelle (I will be of much help as I can). Implement
+ the extension. This project is suitable for a student with a bit of math background.
+ It requires knowledge of the functional programming language ML, which
+ however can be learned quickly provided you have already written code
+ in another functional programming language.
+ </p>
- References:
- J. Kärkkäinen and P. Sanders, Simple linear work suffix array construction, in ICALP'03, LNCS 2719, Spinger, 2003, pp. 943--955.
- M. Crochemore, J. Désarménien and D. Perrin, A note on the Burrows-Wheeler transformation, Theoret. Comput. Sci., 2005, to appear.
+ <p>
+ <B>Literature:</B> A good starting point for reading about rewriting modulo equivalences
+ is the paper <A HREF="http://www.springerlink.com/content/x7041m1807738832/">here</A>,
+ which uses the ACL2 theorem prover. The implementation of the Isabelle theorem
+ prover is described in much detail in this
+ <A HREF="http://www.inf.kcl.ac.uk/staff/urbanc/Cookbook/">programming tutorial</A>.
+ The standard reference for ML is
+ <A HREF="http://www.cl.cam.ac.uk/~lp15/MLbook/">here</A> (I can lend you my copy
+ of this book for the duration of the project).
+ </p>
- There is a horrendously complicated algorithm for solving these problems.
- Your task would be to understand it, and then implement it.
+
+<li><h4>[CU5] Lexing and Parsing with Derivatives</h4>
-<li> <H5>[CU 4] Simplification modulo Equivalences in Isabelle</H5>
- In this project you have to extend the simplifier of the Isabelle theorem
- prover. Currently, the simplifier only rewrites terms according to equalities
- l = r. Provided ~ is an equivalence relation, the simplifier should also
- be able to rewrite terms according to equivalences of the form l ~ r.
- This project requires knowledge of the functional programming language ML.
+ <p>
+ <B>Description:</B>
+ Lexing and parsing are usually done using automated tools, like
+ <A HREF="http://en.wikipedia.org/wiki/Lex_programming_tool">lex</A> and
+ <A HREF="http://en.wikipedia.org/wiki/Yacc">yacc</A>. The problem
+ with them is that they "work when they work", but if not, they are
+ <A HREF="http://en.wikipedia.org/wiki/Black_box">black boxes</A>
+ which are difficult to debug and change. They are really quite
+ clumsy, to the point that Might wrote a paper titled
+ "<A HREF="http://arxiv.org/pdf/1010.5023v1">Yacc is dead</A>".</p>
+
+ <p>
+ There is simple algorithm for regular expression matching (that is lexing).
+ This algorithm was introduced by
+ <A HREF="http://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)">Brzozowski</A>
+ in 1964. It is based on the notion of derivatives of regular expressions and
+ has proved <A HREF="http://www.cl.cam.ac.uk/~so294/documents/jfp09.pdf">useful</A>
+ for practical lexing. Last year the notion of derivatives was extended by
+ <A HREF="http://matt.might.net/papers/might2011derivatives.pdf">Might et al</A>
+ to <A HREF="http://en.wikipedia.org/wiki/Context-free_grammar">context free grammars</A>
+ and parsing.
+ </p>
+
+ <p>
+ <B>Tasks:</B> Get familiar with the two algorithms and implement them. Regular
+ expression matching is relatively simple; parsing with derivatives is the
+ harder part. Therefore you should empirically evaluate this part and
+ tune your implementation. The project can be carried out in almost all programming
+ languages, including C, Java, Scala, ML, Haskell and so on.
+ </p>
-<li><h5>[CU 5] Parsing with Derivatives</h5>
+ <p>
+ <B>Literature:</B> This
+ <A HREF="http://www.cl.cam.ac.uk/~so294/documents/jfp09.pdf">paper</A>
+ gives a modern introduction to derivative based lexing. Derivative-based
+ parsing is explained <A HREF="http://arxiv.org/pdf/1010.5023v1">here</A>
+ and <A HREF="http://matt.might.net/papers/might2011derivatives.pdf">here</A>.
+ </p>
+
+<li> <H4>[CU6] Equivalence Checking of Regular Expressions using the Method by Antimirov and Mosses</H4>
- Derivatives can be used to implement a regular expression matcher. In
- this project you have to apply this technique to parsing. The starting
- point for this project is the paper "Yacc is Dead" by Matthew Might.
+ <p>
+ <B>Description:</B>
+ Solving the problem of deciding equivalence of regular expressions can be used
+ to decide a number of problems in automated reasoning. Therefore one likes to
+ have a method for equivalence checking that is as fast as possible.
+ </p>
+
+ <p>
+ <B>Tasks:</B>
+ The task is to implement the algorithm by Antimirov and Mosses and compare it to
+ other methods. Hopefully the algorithm can be tuned to be faster than other
+ methods.
+ </p>
-<li> <H5>[CU 6] Equivalence Checking of Regular Expression using Antimirov's Method<H5>
+ <p>
+ <B>Literature:</B>
+ Central to this project is the paper <A HREF="http://www.dcc.fc.up.pt/~nam/publica/ijcs08.pdf">here</A>.
+ Other methods have been described, for example,
+ <A HREF="http://www4.informatik.tu-muenchen.de/~krauss/papers/rexp.pdf">here</A>.
+ </p>
</ul>
</TD>
@@ -157,7 +263,7 @@
<P><!-- Created: Tue Mar 4 00:23:25 GMT 1997 -->
<!-- hhmts start -->
-Last modified: Thu Dec 1 18:10:37 GMT 2011
+Last modified: Fri Dec 2 03:26:32 GMT 2011
<!-- hhmts end -->
<a href="http://validator.w3.org/check/referer">[Validate this page.]</a>
</BODY>