49 <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>, |
50 <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>, |
50 <A HREF="http://www.scala-lang.org/">Scala</A>, |
51 <A HREF="http://www.scala-lang.org/">Scala</A>, |
51 <A HREF="http://caml.inria.fr/">OCaml</A>, ... are also OK). Starting point is |
52 <A HREF="http://caml.inria.fr/">OCaml</A>, ... are also OK). Starting point is |
52 the open source SAT-solver MiniSat (available <A HREF="http://minisat.se/Main.html">here</A>). |
53 the open source SAT-solver MiniSat (available <A HREF="http://minisat.se/Main.html">here</A>). |
53 The long-term hope is that your implementation becomes part of the interactive theorem prover |
54 The long-term hope is that your implementation becomes part of the interactive theorem prover |
54 <A HREF="http://www.cl.cam.ac.uk/research/hvg/isabelle/">Isabelle</A>.</p> |
55 <A HREF="http://www.cl.cam.ac.uk/research/hvg/isabelle/">Isabelle</A>. For this |
|
56 the SAT-solver needs to be implemented in ML.</p> |
55 |
57 |
56 <p> |
58 <p> |
57 <B>Tasks:</B> Understand MiniSat, design and code a SAT-solver in ML, |
59 <B>Tasks:</B> Understand MiniSat, design and code a SAT-solver in ML, |
58 empirical evaluation and tuning of your code.</p> |
60 empirical evaluation and tuning of your code.</p> |
59 |
61 |
60 <p> |
62 <p> |
61 <B>Literature:</B> A good starting point for reading about SAT-solving is the handbook |
63 <B>Literature:</B> A good starting point for reading about SAT-solving is the handbook |
62 article in <A HREF="http://www.cs.cornell.edu/gomes/papers/SATSolvers-KR-Handbook.pdf">here</A>. |
64 article <A HREF="http://www.cs.cornell.edu/gomes/papers/SATSolvers-KR-Handbook.pdf">here</A>. |
63 MiniSat is explained <A HREF="http://minisat.se/downloads/MiniSat.pdf">here</A> and |
65 MiniSat is explained <A HREF="http://minisat.se/downloads/MiniSat.pdf">here</A> and |
64 <A HREF="http://minisat.se/Papers.html">here</A>. The standard reference for ML is |
66 <A HREF="http://minisat.se/Papers.html">here</A>. The standard reference for ML is |
65 <A HREF="http://www.cl.cam.ac.uk/~lp15/MLbook/">here</A> (I can lend you my copy |
67 <A HREF="http://www.cl.cam.ac.uk/~lp15/MLbook/">here</A> (I can lend you my copy |
66 of this book for the duration of the project). The best free implementation of ML is |
68 of this book for the duration of the project). The best free implementation of ML is |
67 <A HREF="http://www.polyml.org/">PolyML</A>. |
69 <A HREF="http://www.polyml.org/">PolyML</A>. |
77 enough to implement in a reasonable amount of time a compiler to an |
79 enough to implement in a reasonable amount of time a compiler to an |
78 idealised assembly language (preferably |
80 idealised assembly language (preferably |
79 <A HREF="http://en.wikipedia.org/wiki/Typed_assembly_language">TAL</A>) or an abstract machine. |
81 <A HREF="http://en.wikipedia.org/wiki/Typed_assembly_language">TAL</A>) or an abstract machine. |
80 This has been explained in full detail in a PhD-thesis by Louis-Julien Guillemette |
82 This has been explained in full detail in a PhD-thesis by Louis-Julien Guillemette |
81 (available in English <A HREF="https://papyrus.bib.umontreal.ca/jspui/bitstream/1866/3454/6/Guillemette_Louis-Julien_2009_these.pdf">here</A>). He used <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A> |
83 (available in English <A HREF="https://papyrus.bib.umontreal.ca/jspui/bitstream/1866/3454/6/Guillemette_Louis-Julien_2009_these.pdf">here</A>). He used <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A> |
82 as his implementation language. Other choices are of course possible. |
84 as his implementation language. Other choices are possible. |
83 </p> |
85 </p> |
84 |
86 |
85 <p> |
87 <p> |
86 <b>Tasks:</b> |
88 <b>Tasks:</b> |
87 Read the relevant literature and implement the various components of a compiler |
89 Read the relevant literature and implement the various components of a compiler |
88 (parser, intermediate languages, simulator for the idealised assembly language). |
90 (parser, intermediate languages, simulator for the idealised assembly language). |
89 This project is for a good student with an interest in programming languages, |
91 This project is for a good student with an interest in programming languages, |
90 who can also translate abstract ideas into code. If it is too difficult, the project can |
92 who can also translate abstract ideas into code. If it is too difficult, the project can |
91 easily be scaled back to the |
93 be easily scaled down to the |
92 <A HREF="http://en.wikipedia.org/wiki/Simply_typed_lambda_calculus">simply-typed |
94 <A HREF="http://en.wikipedia.org/wiki/Simply_typed_lambda_calculus">simply-typed |
93 lambda calculus</A> (which is simpler than |
95 lambda calculus</A> (which is simpler than |
94 System F) or only some components of the compiler are implemented. |
96 System F) or to cover only some components of the compiler. |
95 </p> |
97 </p> |
96 |
98 |
97 <p> |
99 <p> |
98 <B>Literature:</B> |
100 <B>Literature:</B> |
99 The <A HREF="https://papyrus.bib.umontreal.ca/jspui/bitstream/1866/3454/6/Guillemette_Louis-Julien_2009_these.pdf">PhD-thesis</A> by Louis-Julien Guillemette is required reading. A shorter |
101 The <A HREF="https://papyrus.bib.umontreal.ca/jspui/bitstream/1866/3454/6/Guillemette_Louis-Julien_2009_these.pdf">PhD-thesis</A> by Louis-Julien Guillemette is required reading. A shorter |
105 </p> |
107 </p> |
106 |
108 |
107 <li> <H4>[CU3] Sorting Suffixes</H4> |
109 <li> <H4>[CU3] Sorting Suffixes</H4> |
108 |
110 |
109 <p><b>Description:</b> Given a string, take all its suffixes, and sort them. |
111 <p><b>Description:</b> Given a string, take all its suffixes, and sort them. |
110 This is often also called <A HREF="http://en.wikipedia.org/wiki/Suffix_array">suffix |
112 This is often called <A HREF="http://en.wikipedia.org/wiki/Suffix_array">suffix |
111 array sorting</A>. It sound simple, but there are some difficulties. |
113 array sorting</A>. It sound simple, but there are some difficulties. |
112 The naive algorithm would generate all (suffix) strings and sort them |
114 The naive algorithm would generate all suffix strings and sort them |
113 using a standard sorting algorithm, for example quick-sort. Unfortunately, |
115 using a standard sorting algorithm, for example |
114 this algorithm is not optimal (it does not take into account that you sort |
116 <A HREF="http://en.wikipedia.org/wiki/Quicksort">quicksort</A>. |
115 suffixes) and it also takes an quadratic amount of space, which is a |
117 The problem is that |
116 problem if you have to sort strings of several Mega-Bytes or even Giga-Bytes |
118 this algorithm is not optimal for suffix sorting: it does not take into account that you sort |
117 (happens often in biotech DNA information.<p> |
119 suffixes and it also takes a quadratic amount of space. This is a |
118 |
120 huge problem if you have to sort strings of several Megabytes or even Gigabytes, |
119 Aim: the notion of index on a text is central in many methods for text |
121 as happens often in biotech and DNA data mining. Suffix sorting is also a crucial operation for the |
120 processing and for the management of textual databases. Suffix Arrays is one |
122 <A HREF="http://en.wikipedia.org/wiki/Burrows?Wheeler_transform">Burrows-Wheeler transform</A> |
121 of these methods based on the sorted list of suffixes of the input text. The |
123 on which the data compression algorithm of the popular |
122 project consists in implementing a linear-time sorting algorithm and other |
124 <A HREF="http://en.wikipedia.org/wiki/Bzip2">bzip2</A> |
123 elements related to Suffix Array construction and to Burrows-Wheeler text |
125 program is based. |
124 compression. Plan: study of the sorting problem in the literature starting |
126 </p> |
125 with the reference below. Implementation of the sorting algorithm and the |
127 |
126 LCP computation to obtain a Suffix Array construction software. Then, using |
128 <p> |
127 this work, implementation of the algorithms described in the second |
129 There are more efficient algorithms for suffix sorting, for example |
128 reference below. Deliverables: report, suffix sorting and associated |
130 <A HREF="http://books.google.co.uk/books?id=Pn1sHToYf9oC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false">here</A> and |
129 software and their documentation. |
131 <A HREF="http://ls11-www.cs.uni-dortmund.de/people/rahmann/teaching/ss2008/AlgorithmenAufSequenzen/09-walk-bwt.pdf">here</A>. |
130 |
132 However the most space efficient algorithm for suffix sorting |
131 References: |
133 (<A HREF="http://www.cs.rutgers.edu/~muthu/fm072.pdf">here</A>) |
132 J. Kärkkäinen and P. Sanders, Simple linear work suffix array construction, in ICALP'03, LNCS 2719, Spinger, 2003, pp. 943--955. |
134 is horrendously complicated. Your task would be to understand it, and then implement it. |
133 M. Crochemore, J. Désarménien and D. Perrin, A note on the Burrows-Wheeler transformation, Theoret. Comput. Sci., 2005, to appear. |
135 </p> |
134 |
136 |
135 There is a horrendously complicated algorithm for solving these problems. |
137 <p> |
136 Your task would be to understand it, and then implement it. |
138 <B>Tasks:</B> |
137 |
139 Start by reading the literature about suffix sorting. Then work through the |
138 <li> <H5>[CU 4] Simplification modulo Equivalences in Isabelle</H5> |
140 12-page <A HREF="http://www.cs.rutgers.edu/~muthu/fm072.pdf">paper</A> |
139 In this project you have to extend the simplifier of the Isabelle theorem |
141 explaining the horrendously complicated algorithm and implement it. |
140 prover. Currently, the simplifier only rewrites terms according to equalities |
142 Time permitting the work can include an implementation of the Burrows-Wheeler |
141 l = r. Provided ~ is an equivalence relation, the simplifier should also |
143 data compression. This project is for a good student, who likes to study in-depth |
142 be able to rewrite terms according to equivalences of the form l ~ r. |
144 algorithms. The project can be carried out in almost all programming languages, |
143 This project requires knowledge of the functional programming language ML. |
145 including C, Java, Scala, ML, Haskell and so on. |
144 |
146 </p> |
145 <li><h5>[CU 5] Parsing with Derivatives</h5> |
147 |
146 |
148 <p> |
147 Derivatives can be used to implement a regular expression matcher. In |
149 <B>Literature:</B> A good starting point for reading about suffix sorting is the |
148 this project you have to apply this technique to parsing. The starting |
150 <A HREF="http://books.google.co.uk/books?id=Pn1sHToYf9oC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false">book</A> by Crochemore. Two simple algorithms are also described |
149 point for this project is the paper "Yacc is Dead" by Matthew Might. |
151 <A HREF="http://ls11-www.cs.uni-dortmund.de/people/rahmann/teaching/ss2008/AlgorithmenAufSequenzen/09-walk-bwt.pdf">here</A>. The main literature is the 12-page |
150 |
152 <A HREF="http://www.cs.rutgers.edu/~muthu/fm072.pdf">article</A> about in-place |
151 <li> <H5>[CU 6] Equivalence Checking of Regular Expression using Antimirov's Method<H5> |
153 suffix sorting. The Burrows-Wheeler data compression is described |
|
154 <A HREF="http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-124.pdf">here</A>. |
|
155 </p> |
|
156 |
|
157 <li> <H4>[CU4] Simplification with Equivalence Relations in the Isabelle Theorem Prover</H4> |
|
158 <p> |
|
159 <B>Description:</B> |
|
160 In this project you have to extend the simplifier of the |
|
161 <A HREF="http://isabelle.in.tum.de/">Isabelle theorem prover</A>. |
|
162 The simplifier is an important reasoning tool of this theorem prover: it |
|
163 replaces a term by another term that can be proved to be equal to it. However, |
|
164 currently the simplifier only rewrites terms according to equalities. |
|
165 Assuming ≈ is an equivalence relation, the simplifier should also be able |
|
166 to rewrite terms according to ≈. Since equivalence relations occur |
|
167 frequently in automated reasoning, this extension would make the simplifier |
|
168 more powerful and useful. The hope is that your code can go into the |
|
169 code base of Isabelle. |
|
170 </p> |
|
171 |
|
172 <p> |
|
173 <B>Tasks:</B> |
|
174 Read the <A HREF="http://www.springerlink.com/content/x7041m1807738832/">paper</A> |
|
175 about rewriting with equivalence relations. Get familiar with parts of the |
|
176 implementation of Isabelle (I will be of much help as I can). Implement |
|
177 the extension. This project is suitable for a student with a bit of math background. |
|
178 It requires knowledge of the functional programming language ML, which |
|
179 however can be learned quickly provided you have already written code |
|
180 in another functional programming language. |
|
181 </p> |
|
182 |
|
183 <p> |
|
184 <B>Literature:</B> A good starting point for reading about rewriting modulo equivalences |
|
185 is the paper <A HREF="http://www.springerlink.com/content/x7041m1807738832/">here</A>, |
|
186 which uses the ACL2 theorem prover. The implementation of the Isabelle theorem |
|
187 prover is described in much detail in this |
|
188 <A HREF="http://www.inf.kcl.ac.uk/staff/urbanc/Cookbook/">programming tutorial</A>. |
|
189 The standard reference for ML is |
|
190 <A HREF="http://www.cl.cam.ac.uk/~lp15/MLbook/">here</A> (I can lend you my copy |
|
191 of this book for the duration of the project). |
|
192 </p> |
|
193 |
|
194 |
|
195 <li><h4>[CU5] Lexing and Parsing with Derivatives</h4> |
|
196 |
|
197 <p> |
|
198 <B>Description:</B> |
|
199 Lexing and parsing are usually done using automated tools, like |
|
200 <A HREF="http://en.wikipedia.org/wiki/Lex_programming_tool">lex</A> and |
|
201 <A HREF="http://en.wikipedia.org/wiki/Yacc">yacc</A>. The problem |
|
202 with them is that they "work when they work", but if not, they are |
|
203 <A HREF="http://en.wikipedia.org/wiki/Black_box">black boxes</A> |
|
204 which are difficult to debug and change. They are really quite |
|
205 clumsy, to the point that Might wrote a paper titled |
|
206 "<A HREF="http://arxiv.org/pdf/1010.5023v1">Yacc is dead</A>".</p> |
|
207 |
|
208 <p> |
|
209 There is simple algorithm for regular expression matching (that is lexing). |
|
210 This algorithm was introduced by |
|
211 <A HREF="http://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)">Brzozowski</A> |
|
212 in 1964. It is based on the notion of derivatives of regular expressions and |
|
213 has proved <A HREF="http://www.cl.cam.ac.uk/~so294/documents/jfp09.pdf">useful</A> |
|
214 for practical lexing. Last year the notion of derivatives was extended by |
|
215 <A HREF="http://matt.might.net/papers/might2011derivatives.pdf">Might et al</A> |
|
216 to <A HREF="http://en.wikipedia.org/wiki/Context-free_grammar">context free grammars</A> |
|
217 and parsing. |
|
218 </p> |
|
219 |
|
220 <p> |
|
221 <B>Tasks:</B> Get familiar with the two algorithms and implement them. Regular |
|
222 expression matching is relatively simple; parsing with derivatives is the |
|
223 harder part. Therefore you should empirically evaluate this part and |
|
224 tune your implementation. The project can be carried out in almost all programming |
|
225 languages, including C, Java, Scala, ML, Haskell and so on. |
|
226 </p> |
|
227 |
|
228 <p> |
|
229 <B>Literature:</B> This |
|
230 <A HREF="http://www.cl.cam.ac.uk/~so294/documents/jfp09.pdf">paper</A> |
|
231 gives a modern introduction to derivative based lexing. Derivative-based |
|
232 parsing is explained <A HREF="http://arxiv.org/pdf/1010.5023v1">here</A> |
|
233 and <A HREF="http://matt.might.net/papers/might2011derivatives.pdf">here</A>. |
|
234 </p> |
|
235 |
|
236 <li> <H4>[CU6] Equivalence Checking of Regular Expressions using the Method by Antimirov and Mosses</H4> |
|
237 |
|
238 <p> |
|
239 <B>Description:</B> |
|
240 Solving the problem of deciding equivalence of regular expressions can be used |
|
241 to decide a number of problems in automated reasoning. Therefore one likes to |
|
242 have a method for equivalence checking that is as fast as possible. |
|
243 </p> |
|
244 |
|
245 <p> |
|
246 <B>Tasks:</B> |
|
247 The task is to implement the algorithm by Antimirov and Mosses and compare it to |
|
248 other methods. Hopefully the algorithm can be tuned to be faster than other |
|
249 methods. |
|
250 </p> |
|
251 |
|
252 <p> |
|
253 <B>Literature:</B> |
|
254 Central to this project is the paper <A HREF="http://www.dcc.fc.up.pt/~nam/publica/ijcs08.pdf">here</A>. |
|
255 Other methods have been described, for example, |
|
256 <A HREF="http://www4.informatik.tu-muenchen.de/~krauss/papers/rexp.pdf">here</A>. |
|
257 </p> |
152 |
258 |
153 </ul> |
259 </ul> |
154 </TD> |
260 </TD> |
155 </TR> |
261 </TR> |
156 </TABLE> |
262 </TABLE> |
157 |
263 |
158 <P><!-- Created: Tue Mar 4 00:23:25 GMT 1997 --> |
264 <P><!-- Created: Tue Mar 4 00:23:25 GMT 1997 --> |
159 <!-- hhmts start --> |
265 <!-- hhmts start --> |
160 Last modified: Thu Dec 1 18:10:37 GMT 2011 |
266 Last modified: Fri Dec 2 03:26:32 GMT 2011 |
161 <!-- hhmts end --> |
267 <!-- hhmts end --> |
162 <a href="http://validator.w3.org/check/referer">[Validate this page.]</a> |
268 <a href="http://validator.w3.org/check/referer">[Validate this page.]</a> |
163 </BODY> |
269 </BODY> |
164 </HTML> |
270 </HTML> |