--- a/projects.html	Thu Dec 01 23:36:19 2011 +0000
+++ b/projects.html	Fri Dec 02 03:28:02 2011 +0000
@@ -31,6 +31,7 @@
 <H2>2011/12 MSc Individual Projects</H2>
 <H4>Supervisor: Christian Urban</H4> 
 <H4>Email: @kcl   Office: Strand Building S6.30</H4>
+<H4>If you are interested in a project, please send me email and we can discuss details.</H4> 
 
 <ul class="striped">
 <li> <H4>[CU1] Implementing a SAT-Solver in a Functional Programming Language</H4>
@@ -51,7 +52,8 @@
   <A HREF="http://caml.inria.fr/">OCaml</A>, ... are also OK). Starting point is 
   the open source SAT-solver MiniSat (available <A HREF="http://minisat.se/Main.html">here</A>). 
   The long-term hope is that your implementation becomes part of the interactive theorem prover 
-  <A HREF="http://www.cl.cam.ac.uk/research/hvg/isabelle/">Isabelle</A>.</p> 
+  <A HREF="http://www.cl.cam.ac.uk/research/hvg/isabelle/">Isabelle</A>. For this
+  the SAT-solver needs to be implemented in ML.</p> 
 
   <p>
   <B>Tasks:</B> Understand MiniSat, design and code a SAT-solver in ML, 
@@ -59,7 +61,7 @@
 
   <p>
   <B>Literature:</B> A good starting point for reading about SAT-solving is the handbook
-  article in <A HREF="http://www.cs.cornell.edu/gomes/papers/SATSolvers-KR-Handbook.pdf">here</A>.
+  article <A HREF="http://www.cs.cornell.edu/gomes/papers/SATSolvers-KR-Handbook.pdf">here</A>.
   MiniSat is explained <A HREF="http://minisat.se/downloads/MiniSat.pdf">here</A> and
   <A HREF="http://minisat.se/Papers.html">here</A>. The standard reference for ML is
   <A HREF="http://www.cl.cam.ac.uk/~lp15/MLbook/">here</A> (I can lend you my copy 
@@ -79,7 +81,7 @@
   <A HREF="http://en.wikipedia.org/wiki/Typed_assembly_language">TAL</A>) or an abstract machine.
   This has been explained in full detail in a PhD-thesis by  Louis-Julien Guillemette
   (available in English <A HREF="https://papyrus.bib.umontreal.ca/jspui/bitstream/1866/3454/6/Guillemette_Louis-Julien_2009_these.pdf">here</A>). He used <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>
-  as his implementation language. Other choices are of course possible.
+  as his implementation language. Other choices are possible.
   </p>
 
   <p>
@@ -88,10 +90,10 @@
   (parser, intermediate languages, simulator for the idealised assembly language).
   This project is for a good student with an interest in programming languages,
   who can also translate abstract ideas into code. If it is too difficult, the project can
-  easily be scaled back to the 
+  be easily scaled down to the 
   <A HREF="http://en.wikipedia.org/wiki/Simply_typed_lambda_calculus">simply-typed 
   lambda calculus</A> (which is simpler than
-  System F) or only some components of the compiler are implemented.
+  System F) or to cover only some components of the compiler.
   </p> 
 
   <p>
@@ -107,48 +109,152 @@
   <li> <H4>[CU3] Sorting Suffixes</H4>
   
   <p><b>Description:</b> Given a string, take all its suffixes, and sort them.
-  This is often also called <A HREF="http://en.wikipedia.org/wiki/Suffix_array">suffix 
+  This is often called <A HREF="http://en.wikipedia.org/wiki/Suffix_array">suffix 
   array sorting</A>. It sound simple, but there are some difficulties. 
-  The naive algorithm would generate all (suffix) strings and sort them
-  using a standard sorting algorithm, for example quick-sort. Unfortunately,
-  this algorithm is not optimal (it does not take into account that you sort
-  suffixes) and it also takes an quadratic amount of space, which is a 
-  problem if you have to sort strings of several Mega-Bytes or even Giga-Bytes 
-  (happens often in biotech DNA information.<p> 
+  The naive algorithm would generate all suffix strings and sort them
+  using a standard sorting algorithm, for example 
+  <A HREF="http://en.wikipedia.org/wiki/Quicksort">quicksort</A>. 
+  The problem is that
+  this algorithm is not optimal for suffix sorting: it does not take into account that you sort
+  suffixes and it also takes a quadratic amount of space. This is a 
+  huge problem if you have to sort strings of several Megabytes or even Gigabytes,
+  as happens often in biotech and DNA data mining. Suffix sorting is also a crucial operation for the 
+  <A HREF="http://en.wikipedia.org/wiki/Burrows?Wheeler_transform">Burrows-Wheeler transform</A>
+  on which the data compression algorithm of the popular 
+  <A HREF="http://en.wikipedia.org/wiki/Bzip2">bzip2</A>
+  program is based.
+  </p>
 
-  Aim: the notion of index on a text is central in many methods for text
-  processing and for the management of textual databases. Suffix Arrays is one
-  of these methods based on the sorted list of suffixes of the input text. The
-  project consists in implementing a linear-time sorting algorithm and other
-  elements related to Suffix Array construction and to Burrows-Wheeler text
-  compression. Plan: study of the sorting problem in the literature starting
-  with the reference below. Implementation of the sorting algorithm and the
-  LCP computation to obtain a Suffix Array construction software. Then, using
-  this work, implementation of the algorithms described in the second
-  reference below. Deliverables: report, suffix sorting and associated
-  software and their documentation.
+  <p>
+  There are more efficient algorithms for suffix sorting, for example 
+  <A HREF="http://books.google.co.uk/books?id=Pn1sHToYf9oC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false">here</A> and 
+  <A HREF="http://ls11-www.cs.uni-dortmund.de/people/rahmann/teaching/ss2008/AlgorithmenAufSequenzen/09-walk-bwt.pdf">here</A>. 
+  However the most space efficient algorithm for suffix sorting  
+  (<A HREF="http://www.cs.rutgers.edu/~muthu/fm072.pdf">here</A>) 
+  is horrendously complicated. Your task would be to understand it, and then implement it.
+  </p>
+  
+  <p>
+  <B>Tasks:</B>
+  Start by reading the literature about suffix sorting. Then work through the
+  12-page <A HREF="http://www.cs.rutgers.edu/~muthu/fm072.pdf">paper</A> 
+  explaining the horrendously complicated algorithm and implement it.
+  Time permitting the work can include an implementation of the Burrows-Wheeler 
+  data compression. This project is for a good student, who likes to study in-depth 
+  algorithms. The project can be carried out in almost all programming languages,
+  including C, Java, Scala, ML, Haskell and so on.
+  </p>
+
+  <p>
+  <B>Literature:</B> A good starting point for reading about suffix sorting is the 
+  <A HREF="http://books.google.co.uk/books?id=Pn1sHToYf9oC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false">book</A> by Crochemore. Two simple algorithms are also described
+  <A HREF="http://ls11-www.cs.uni-dortmund.de/people/rahmann/teaching/ss2008/AlgorithmenAufSequenzen/09-walk-bwt.pdf">here</A>. The main literature is the 12-page
+  <A HREF="http://www.cs.rutgers.edu/~muthu/fm072.pdf">article</A> about in-place
+  suffix sorting. The Burrows-Wheeler data compression is described 
+  <A HREF="http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-124.pdf">here</A>.
+  </p>
+
+<li> <H4>[CU4] Simplification with Equivalence Relations in the Isabelle Theorem Prover</H4>
+  <p>
+  <B>Description:</B>
+  In this project you have to extend the simplifier of the 
+  <A HREF="http://isabelle.in.tum.de/">Isabelle theorem prover</A>.  
+  The simplifier is an important reasoning tool of this theorem prover: it 
+  replaces a term by another term that can be proved to be equal to it. However, 
+  currently the simplifier only rewrites terms according to equalities. 
+  Assuming &asymp; is an equivalence relation, the simplifier should also be able 
+  to rewrite terms according to &asymp;. Since equivalence relations occur 
+  frequently in automated reasoning, this extension would make the simplifier 
+  more powerful and useful. The hope is that your code can go into the
+  code base of Isabelle.
+  </p>
+
+  <p>
+  <B>Tasks:</B>	
+  Read the <A HREF="http://www.springerlink.com/content/x7041m1807738832/">paper</A>
+  about rewriting with equivalence relations. Get familiar with parts of the 
+  implementation of Isabelle (I will be of much help as I can). Implement
+  the extension. This project is suitable for a student with a bit of math background.
+  It requires knowledge of the functional programming language ML, which
+  however can be learned quickly provided you have already written code
+  in another functional programming language.
+  </p>
 
-  References: 
-  J. Kärkkäinen and P. Sanders,  Simple linear work suffix array construction, in ICALP'03, LNCS 2719, Spinger, 2003, pp. 943--955. 
-  M. Crochemore, J. Désarménien and D. Perrin,  A note on the Burrows-Wheeler transformation, Theoret. Comput. Sci., 2005, to appear.
+  <p>
+  <B>Literature:</B> A good starting point for reading about rewriting modulo equivalences 
+  is the paper <A HREF="http://www.springerlink.com/content/x7041m1807738832/">here</A>, 
+  which uses the ACL2 theorem prover. The implementation of the Isabelle theorem
+  prover is described in much detail in this 
+  <A HREF="http://www.inf.kcl.ac.uk/staff/urbanc/Cookbook/">programming tutorial</A>.
+  The standard reference for ML is
+  <A HREF="http://www.cl.cam.ac.uk/~lp15/MLbook/">here</A> (I can lend you my copy 
+  of this book for the duration of the project).
+  </p>
 
-  There is a horrendously complicated algorithm for solving these problems. 
-  Your task would be to understand it, and then implement it.
+
+<li><h4>[CU5] Lexing and Parsing with Derivatives</h4>
 
-<li> <H5>[CU 4] Simplification modulo Equivalences in Isabelle</H5>
-  In this project you have to extend the simplifier of the Isabelle theorem 
-  prover.  Currently, the simplifier only rewrites terms according to equalities 
-  l = r. Provided ~ is an equivalence relation, the simplifier should also 
-  be able to rewrite terms according to equivalences of the form l ~ r.
-  This project requires knowledge of the functional programming language ML.
+  <p>
+  <B>Description:</B>
+  Lexing and parsing are usually done using automated tools, like 
+  <A HREF="http://en.wikipedia.org/wiki/Lex_programming_tool">lex</A> and 
+  <A HREF="http://en.wikipedia.org/wiki/Yacc">yacc</A>. The problem 
+  with them is that they "work when they work", but if not, they are
+  <A HREF="http://en.wikipedia.org/wiki/Black_box">black boxes</A>
+  which are difficult to debug and change. They are really quite 
+  clumsy, to the point that Might wrote a paper titled 
+  "<A HREF="http://arxiv.org/pdf/1010.5023v1">Yacc is dead</A>".</p>
+ 
+  <p>
+  There is simple algorithm for regular expression matching (that is lexing).
+  This algorithm was introduced by 
+  <A HREF="http://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)">Brzozowski</A> 
+  in 1964. It is based on the notion of derivatives of regular expressions and 
+  has proved <A HREF="http://www.cl.cam.ac.uk/~so294/documents/jfp09.pdf">useful</A> 
+  for practical lexing. Last year the notion of derivatives was extended by 
+  <A HREF="http://matt.might.net/papers/might2011derivatives.pdf">Might et al</A>
+  to <A HREF="http://en.wikipedia.org/wiki/Context-free_grammar">context free grammars</A> 
+  and parsing.
+  </p>		      
+  
+  <p>
+  <B>Tasks:</B> Get familiar with the two algorithms and implement them. Regular
+  expression matching is relatively simple; parsing with derivatives is the 
+  harder part. Therefore you should empirically evaluate this part and
+  tune your implementation. The project can be carried out in almost all programming 
+  languages, including C, Java, Scala, ML, Haskell and so on.
+  </p>
 
-<li><h5>[CU 5] Parsing with Derivatives</h5>
+  <p>
+  <B>Literature:</B> This 
+  <A HREF="http://www.cl.cam.ac.uk/~so294/documents/jfp09.pdf">paper</A> 
+  gives a modern introduction to derivative based lexing. Derivative-based
+  parsing is explained <A HREF="http://arxiv.org/pdf/1010.5023v1">here</A>
+  and <A HREF="http://matt.might.net/papers/might2011derivatives.pdf">here</A>.
+  </p>  
+
+<li> <H4>[CU6] Equivalence Checking of Regular Expressions using the Method by Antimirov and Mosses</H4>
 
-  Derivatives can be used to implement a regular expression matcher. In 
-  this project you have to apply this technique to parsing. The starting 
-  point for this project is the paper "Yacc is Dead" by Matthew Might.
+  <p>
+  <B>Description:</B> 
+  Solving the problem of deciding equivalence of regular expressions can be used
+  to decide a number of problems in automated reasoning. Therefore one likes to
+  have a method for equivalence checking that is as fast as possible. 
+  </p>		      
+  
+  <p>
+  <B>Tasks:</B>
+  The task is to implement the algorithm by Antimirov and Mosses and compare it to
+  other methods. Hopefully the algorithm can be tuned to be faster than other
+  methods.
+  </p>
 
-<li> <H5>[CU 6] Equivalence Checking of Regular Expression using Antimirov's Method<H5>
+  <p>
+  <B>Literature:</B>
+  Central to this project is the paper <A HREF="http://www.dcc.fc.up.pt/~nam/publica/ijcs08.pdf">here</A>.
+  Other methods have been described, for example, 
+  <A HREF="http://www4.informatik.tu-muenchen.de/~krauss/papers/rexp.pdf">here</A>.
+  </p>  
 
 </ul>
 </TD>
@@ -157,7 +263,7 @@
 
 <P><!-- Created: Tue Mar  4 00:23:25 GMT 1997 -->
 <!-- hhmts start -->
-Last modified: Thu Dec  1 18:10:37 GMT 2011
+Last modified: Fri Dec  2 03:26:32 GMT 2011
 <!-- hhmts end -->
 <a href="http://validator.w3.org/check/referer">[Validate this page.]</a>
 </BODY>
changeset 44	790a40046dc8
parent 43	a6c077ba850a
child 47	e0d36fd0a8fd