projects.html
changeset 44 790a40046dc8
parent 43 a6c077ba850a
child 47 e0d36fd0a8fd
equal deleted inserted replaced
43:a6c077ba850a 44:790a40046dc8
    29     VALIGN="TOP">
    29     VALIGN="TOP">
    30 
    30 
    31 <H2>2011/12 MSc Individual Projects</H2>
    31 <H2>2011/12 MSc Individual Projects</H2>
    32 <H4>Supervisor: Christian Urban</H4> 
    32 <H4>Supervisor: Christian Urban</H4> 
    33 <H4>Email: @kcl   Office: Strand Building S6.30</H4>
    33 <H4>Email: @kcl   Office: Strand Building S6.30</H4>
       
    34 <H4>If you are interested in a project, please send me email and we can discuss details.</H4> 
    34 
    35 
    35 <ul class="striped">
    36 <ul class="striped">
    36 <li> <H4>[CU1] Implementing a SAT-Solver in a Functional Programming Language</H4>
    37 <li> <H4>[CU1] Implementing a SAT-Solver in a Functional Programming Language</H4>
    37 
    38 
    38   <p><B>Description:</b>  
    39   <p><B>Description:</b>  
    49   <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>, 
    50   <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>, 
    50   <A HREF="http://www.scala-lang.org/">Scala</A>,
    51   <A HREF="http://www.scala-lang.org/">Scala</A>,
    51   <A HREF="http://caml.inria.fr/">OCaml</A>, ... are also OK). Starting point is 
    52   <A HREF="http://caml.inria.fr/">OCaml</A>, ... are also OK). Starting point is 
    52   the open source SAT-solver MiniSat (available <A HREF="http://minisat.se/Main.html">here</A>). 
    53   the open source SAT-solver MiniSat (available <A HREF="http://minisat.se/Main.html">here</A>). 
    53   The long-term hope is that your implementation becomes part of the interactive theorem prover 
    54   The long-term hope is that your implementation becomes part of the interactive theorem prover 
    54   <A HREF="http://www.cl.cam.ac.uk/research/hvg/isabelle/">Isabelle</A>.</p> 
    55   <A HREF="http://www.cl.cam.ac.uk/research/hvg/isabelle/">Isabelle</A>. For this
       
    56   the SAT-solver needs to be implemented in ML.</p> 
    55 
    57 
    56   <p>
    58   <p>
    57   <B>Tasks:</B> Understand MiniSat, design and code a SAT-solver in ML, 
    59   <B>Tasks:</B> Understand MiniSat, design and code a SAT-solver in ML, 
    58   empirical evaluation and tuning of your code.</p>
    60   empirical evaluation and tuning of your code.</p>
    59 
    61 
    60   <p>
    62   <p>
    61   <B>Literature:</B> A good starting point for reading about SAT-solving is the handbook
    63   <B>Literature:</B> A good starting point for reading about SAT-solving is the handbook
    62   article in <A HREF="http://www.cs.cornell.edu/gomes/papers/SATSolvers-KR-Handbook.pdf">here</A>.
    64   article <A HREF="http://www.cs.cornell.edu/gomes/papers/SATSolvers-KR-Handbook.pdf">here</A>.
    63   MiniSat is explained <A HREF="http://minisat.se/downloads/MiniSat.pdf">here</A> and
    65   MiniSat is explained <A HREF="http://minisat.se/downloads/MiniSat.pdf">here</A> and
    64   <A HREF="http://minisat.se/Papers.html">here</A>. The standard reference for ML is
    66   <A HREF="http://minisat.se/Papers.html">here</A>. The standard reference for ML is
    65   <A HREF="http://www.cl.cam.ac.uk/~lp15/MLbook/">here</A> (I can lend you my copy 
    67   <A HREF="http://www.cl.cam.ac.uk/~lp15/MLbook/">here</A> (I can lend you my copy 
    66   of this book for the duration of the project). The best free implementation of ML is 
    68   of this book for the duration of the project). The best free implementation of ML is 
    67   <A HREF="http://www.polyml.org/">PolyML</A>.
    69   <A HREF="http://www.polyml.org/">PolyML</A>.
    77   enough to implement in a reasonable amount of time a compiler to an
    79   enough to implement in a reasonable amount of time a compiler to an
    78   idealised assembly language (preferably 
    80   idealised assembly language (preferably 
    79   <A HREF="http://en.wikipedia.org/wiki/Typed_assembly_language">TAL</A>) or an abstract machine.
    81   <A HREF="http://en.wikipedia.org/wiki/Typed_assembly_language">TAL</A>) or an abstract machine.
    80   This has been explained in full detail in a PhD-thesis by  Louis-Julien Guillemette
    82   This has been explained in full detail in a PhD-thesis by  Louis-Julien Guillemette
    81   (available in English <A HREF="https://papyrus.bib.umontreal.ca/jspui/bitstream/1866/3454/6/Guillemette_Louis-Julien_2009_these.pdf">here</A>). He used <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>
    83   (available in English <A HREF="https://papyrus.bib.umontreal.ca/jspui/bitstream/1866/3454/6/Guillemette_Louis-Julien_2009_these.pdf">here</A>). He used <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>
    82   as his implementation language. Other choices are of course possible.
    84   as his implementation language. Other choices are possible.
    83   </p>
    85   </p>
    84 
    86 
    85   <p>
    87   <p>
    86   <b>Tasks:</b>
    88   <b>Tasks:</b>
    87   Read the relevant literature and implement the various components of a compiler
    89   Read the relevant literature and implement the various components of a compiler
    88   (parser, intermediate languages, simulator for the idealised assembly language).
    90   (parser, intermediate languages, simulator for the idealised assembly language).
    89   This project is for a good student with an interest in programming languages,
    91   This project is for a good student with an interest in programming languages,
    90   who can also translate abstract ideas into code. If it is too difficult, the project can
    92   who can also translate abstract ideas into code. If it is too difficult, the project can
    91   easily be scaled back to the 
    93   be easily scaled down to the 
    92   <A HREF="http://en.wikipedia.org/wiki/Simply_typed_lambda_calculus">simply-typed 
    94   <A HREF="http://en.wikipedia.org/wiki/Simply_typed_lambda_calculus">simply-typed 
    93   lambda calculus</A> (which is simpler than
    95   lambda calculus</A> (which is simpler than
    94   System F) or only some components of the compiler are implemented.
    96   System F) or to cover only some components of the compiler.
    95   </p> 
    97   </p> 
    96 
    98 
    97   <p>
    99   <p>
    98   <B>Literature:</B>
   100   <B>Literature:</B>
    99   The <A HREF="https://papyrus.bib.umontreal.ca/jspui/bitstream/1866/3454/6/Guillemette_Louis-Julien_2009_these.pdf">PhD-thesis</A> by  Louis-Julien Guillemette is required reading. A shorter
   101   The <A HREF="https://papyrus.bib.umontreal.ca/jspui/bitstream/1866/3454/6/Guillemette_Louis-Julien_2009_these.pdf">PhD-thesis</A> by  Louis-Julien Guillemette is required reading. A shorter
   105   </p>
   107   </p>
   106 
   108 
   107   <li> <H4>[CU3] Sorting Suffixes</H4>
   109   <li> <H4>[CU3] Sorting Suffixes</H4>
   108   
   110   
   109   <p><b>Description:</b> Given a string, take all its suffixes, and sort them.
   111   <p><b>Description:</b> Given a string, take all its suffixes, and sort them.
   110   This is often also called <A HREF="http://en.wikipedia.org/wiki/Suffix_array">suffix 
   112   This is often called <A HREF="http://en.wikipedia.org/wiki/Suffix_array">suffix 
   111   array sorting</A>. It sound simple, but there are some difficulties. 
   113   array sorting</A>. It sound simple, but there are some difficulties. 
   112   The naive algorithm would generate all (suffix) strings and sort them
   114   The naive algorithm would generate all suffix strings and sort them
   113   using a standard sorting algorithm, for example quick-sort. Unfortunately,
   115   using a standard sorting algorithm, for example 
   114   this algorithm is not optimal (it does not take into account that you sort
   116   <A HREF="http://en.wikipedia.org/wiki/Quicksort">quicksort</A>. 
   115   suffixes) and it also takes an quadratic amount of space, which is a 
   117   The problem is that
   116   problem if you have to sort strings of several Mega-Bytes or even Giga-Bytes 
   118   this algorithm is not optimal for suffix sorting: it does not take into account that you sort
   117   (happens often in biotech DNA information.<p> 
   119   suffixes and it also takes a quadratic amount of space. This is a 
   118 
   120   huge problem if you have to sort strings of several Megabytes or even Gigabytes,
   119   Aim: the notion of index on a text is central in many methods for text
   121   as happens often in biotech and DNA data mining. Suffix sorting is also a crucial operation for the 
   120   processing and for the management of textual databases. Suffix Arrays is one
   122   <A HREF="http://en.wikipedia.org/wiki/Burrows?Wheeler_transform">Burrows-Wheeler transform</A>
   121   of these methods based on the sorted list of suffixes of the input text. The
   123   on which the data compression algorithm of the popular 
   122   project consists in implementing a linear-time sorting algorithm and other
   124   <A HREF="http://en.wikipedia.org/wiki/Bzip2">bzip2</A>
   123   elements related to Suffix Array construction and to Burrows-Wheeler text
   125   program is based.
   124   compression. Plan: study of the sorting problem in the literature starting
   126   </p>
   125   with the reference below. Implementation of the sorting algorithm and the
   127 
   126   LCP computation to obtain a Suffix Array construction software. Then, using
   128   <p>
   127   this work, implementation of the algorithms described in the second
   129   There are more efficient algorithms for suffix sorting, for example 
   128   reference below. Deliverables: report, suffix sorting and associated
   130   <A HREF="http://books.google.co.uk/books?id=Pn1sHToYf9oC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false">here</A> and 
   129   software and their documentation.
   131   <A HREF="http://ls11-www.cs.uni-dortmund.de/people/rahmann/teaching/ss2008/AlgorithmenAufSequenzen/09-walk-bwt.pdf">here</A>. 
   130 
   132   However the most space efficient algorithm for suffix sorting  
   131   References: 
   133   (<A HREF="http://www.cs.rutgers.edu/~muthu/fm072.pdf">here</A>) 
   132   J. Kärkkäinen and P. Sanders,  Simple linear work suffix array construction, in ICALP'03, LNCS 2719, Spinger, 2003, pp. 943--955. 
   134   is horrendously complicated. Your task would be to understand it, and then implement it.
   133   M. Crochemore, J. Désarménien and D. Perrin,  A note on the Burrows-Wheeler transformation, Theoret. Comput. Sci., 2005, to appear.
   135   </p>
   134 
   136   
   135   There is a horrendously complicated algorithm for solving these problems. 
   137   <p>
   136   Your task would be to understand it, and then implement it.
   138   <B>Tasks:</B>
   137 
   139   Start by reading the literature about suffix sorting. Then work through the
   138 <li> <H5>[CU 4] Simplification modulo Equivalences in Isabelle</H5>
   140   12-page <A HREF="http://www.cs.rutgers.edu/~muthu/fm072.pdf">paper</A> 
   139   In this project you have to extend the simplifier of the Isabelle theorem 
   141   explaining the horrendously complicated algorithm and implement it.
   140   prover.  Currently, the simplifier only rewrites terms according to equalities 
   142   Time permitting the work can include an implementation of the Burrows-Wheeler 
   141   l = r. Provided ~ is an equivalence relation, the simplifier should also 
   143   data compression. This project is for a good student, who likes to study in-depth 
   142   be able to rewrite terms according to equivalences of the form l ~ r.
   144   algorithms. The project can be carried out in almost all programming languages,
   143   This project requires knowledge of the functional programming language ML.
   145   including C, Java, Scala, ML, Haskell and so on.
   144 
   146   </p>
   145 <li><h5>[CU 5] Parsing with Derivatives</h5>
   147 
   146 
   148   <p>
   147   Derivatives can be used to implement a regular expression matcher. In 
   149   <B>Literature:</B> A good starting point for reading about suffix sorting is the 
   148   this project you have to apply this technique to parsing. The starting 
   150   <A HREF="http://books.google.co.uk/books?id=Pn1sHToYf9oC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false">book</A> by Crochemore. Two simple algorithms are also described
   149   point for this project is the paper "Yacc is Dead" by Matthew Might.
   151   <A HREF="http://ls11-www.cs.uni-dortmund.de/people/rahmann/teaching/ss2008/AlgorithmenAufSequenzen/09-walk-bwt.pdf">here</A>. The main literature is the 12-page
   150 
   152   <A HREF="http://www.cs.rutgers.edu/~muthu/fm072.pdf">article</A> about in-place
   151 <li> <H5>[CU 6] Equivalence Checking of Regular Expression using Antimirov's Method<H5>
   153   suffix sorting. The Burrows-Wheeler data compression is described 
       
   154   <A HREF="http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-124.pdf">here</A>.
       
   155   </p>
       
   156 
       
   157 <li> <H4>[CU4] Simplification with Equivalence Relations in the Isabelle Theorem Prover</H4>
       
   158   <p>
       
   159   <B>Description:</B>
       
   160   In this project you have to extend the simplifier of the 
       
   161   <A HREF="http://isabelle.in.tum.de/">Isabelle theorem prover</A>.  
       
   162   The simplifier is an important reasoning tool of this theorem prover: it 
       
   163   replaces a term by another term that can be proved to be equal to it. However, 
       
   164   currently the simplifier only rewrites terms according to equalities. 
       
   165   Assuming &asymp; is an equivalence relation, the simplifier should also be able 
       
   166   to rewrite terms according to &asymp;. Since equivalence relations occur 
       
   167   frequently in automated reasoning, this extension would make the simplifier 
       
   168   more powerful and useful. The hope is that your code can go into the
       
   169   code base of Isabelle.
       
   170   </p>
       
   171 
       
   172   <p>
       
   173   <B>Tasks:</B>	
       
   174   Read the <A HREF="http://www.springerlink.com/content/x7041m1807738832/">paper</A>
       
   175   about rewriting with equivalence relations. Get familiar with parts of the 
       
   176   implementation of Isabelle (I will be of much help as I can). Implement
       
   177   the extension. This project is suitable for a student with a bit of math background.
       
   178   It requires knowledge of the functional programming language ML, which
       
   179   however can be learned quickly provided you have already written code
       
   180   in another functional programming language.
       
   181   </p>
       
   182 
       
   183   <p>
       
   184   <B>Literature:</B> A good starting point for reading about rewriting modulo equivalences 
       
   185   is the paper <A HREF="http://www.springerlink.com/content/x7041m1807738832/">here</A>, 
       
   186   which uses the ACL2 theorem prover. The implementation of the Isabelle theorem
       
   187   prover is described in much detail in this 
       
   188   <A HREF="http://www.inf.kcl.ac.uk/staff/urbanc/Cookbook/">programming tutorial</A>.
       
   189   The standard reference for ML is
       
   190   <A HREF="http://www.cl.cam.ac.uk/~lp15/MLbook/">here</A> (I can lend you my copy 
       
   191   of this book for the duration of the project).
       
   192   </p>
       
   193 
       
   194 
       
   195 <li><h4>[CU5] Lexing and Parsing with Derivatives</h4>
       
   196 
       
   197   <p>
       
   198   <B>Description:</B>
       
   199   Lexing and parsing are usually done using automated tools, like 
       
   200   <A HREF="http://en.wikipedia.org/wiki/Lex_programming_tool">lex</A> and 
       
   201   <A HREF="http://en.wikipedia.org/wiki/Yacc">yacc</A>. The problem 
       
   202   with them is that they "work when they work", but if not, they are
       
   203   <A HREF="http://en.wikipedia.org/wiki/Black_box">black boxes</A>
       
   204   which are difficult to debug and change. They are really quite 
       
   205   clumsy, to the point that Might wrote a paper titled 
       
   206   "<A HREF="http://arxiv.org/pdf/1010.5023v1">Yacc is dead</A>".</p>
       
   207  
       
   208   <p>
       
   209   There is simple algorithm for regular expression matching (that is lexing).
       
   210   This algorithm was introduced by 
       
   211   <A HREF="http://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)">Brzozowski</A> 
       
   212   in 1964. It is based on the notion of derivatives of regular expressions and 
       
   213   has proved <A HREF="http://www.cl.cam.ac.uk/~so294/documents/jfp09.pdf">useful</A> 
       
   214   for practical lexing. Last year the notion of derivatives was extended by 
       
   215   <A HREF="http://matt.might.net/papers/might2011derivatives.pdf">Might et al</A>
       
   216   to <A HREF="http://en.wikipedia.org/wiki/Context-free_grammar">context free grammars</A> 
       
   217   and parsing.
       
   218   </p>		      
       
   219   
       
   220   <p>
       
   221   <B>Tasks:</B> Get familiar with the two algorithms and implement them. Regular
       
   222   expression matching is relatively simple; parsing with derivatives is the 
       
   223   harder part. Therefore you should empirically evaluate this part and
       
   224   tune your implementation. The project can be carried out in almost all programming 
       
   225   languages, including C, Java, Scala, ML, Haskell and so on.
       
   226   </p>
       
   227 
       
   228   <p>
       
   229   <B>Literature:</B> This 
       
   230   <A HREF="http://www.cl.cam.ac.uk/~so294/documents/jfp09.pdf">paper</A> 
       
   231   gives a modern introduction to derivative based lexing. Derivative-based
       
   232   parsing is explained <A HREF="http://arxiv.org/pdf/1010.5023v1">here</A>
       
   233   and <A HREF="http://matt.might.net/papers/might2011derivatives.pdf">here</A>.
       
   234   </p>  
       
   235 
       
   236 <li> <H4>[CU6] Equivalence Checking of Regular Expressions using the Method by Antimirov and Mosses</H4>
       
   237 
       
   238   <p>
       
   239   <B>Description:</B> 
       
   240   Solving the problem of deciding equivalence of regular expressions can be used
       
   241   to decide a number of problems in automated reasoning. Therefore one likes to
       
   242   have a method for equivalence checking that is as fast as possible. 
       
   243   </p>		      
       
   244   
       
   245   <p>
       
   246   <B>Tasks:</B>
       
   247   The task is to implement the algorithm by Antimirov and Mosses and compare it to
       
   248   other methods. Hopefully the algorithm can be tuned to be faster than other
       
   249   methods.
       
   250   </p>
       
   251 
       
   252   <p>
       
   253   <B>Literature:</B>
       
   254   Central to this project is the paper <A HREF="http://www.dcc.fc.up.pt/~nam/publica/ijcs08.pdf">here</A>.
       
   255   Other methods have been described, for example, 
       
   256   <A HREF="http://www4.informatik.tu-muenchen.de/~krauss/papers/rexp.pdf">here</A>.
       
   257   </p>  
   152 
   258 
   153 </ul>
   259 </ul>
   154 </TD>
   260 </TD>
   155 </TR>
   261 </TR>
   156 </TABLE>
   262 </TABLE>
   157 
   263 
   158 <P><!-- Created: Tue Mar  4 00:23:25 GMT 1997 -->
   264 <P><!-- Created: Tue Mar  4 00:23:25 GMT 1997 -->
   159 <!-- hhmts start -->
   265 <!-- hhmts start -->
   160 Last modified: Thu Dec  1 18:10:37 GMT 2011
   266 Last modified: Fri Dec  2 03:26:32 GMT 2011
   161 <!-- hhmts end -->
   267 <!-- hhmts end -->
   162 <a href="http://validator.w3.org/check/referer">[Validate this page.]</a>
   268 <a href="http://validator.w3.org/check/referer">[Validate this page.]</a>
   163 </BODY>
   269 </BODY>
   164 </HTML>
   270 </HTML>