33 <H4>Email: christian dot urban at kcl dot ac dot uk, Office: Strand Building S1.27</H4> |
33 <H4>Email: christian dot urban at kcl dot ac dot uk, Office: Strand Building S1.27</H4> |
34 <H4>If you are interested in a project, please send me an email and we can discuss details. Please include |
34 <H4>If you are interested in a project, please send me an email and we can discuss details. Please include |
35 a short description about your programming skills and Computer Science background in your first email. |
35 a short description about your programming skills and Computer Science background in your first email. |
36 I will also need your King's username in order to book the project for you. Thanks.</H4> |
36 I will also need your King's username in order to book the project for you. Thanks.</H4> |
37 |
37 |
38 <H4>Note that besides being a lecturer at the theoretical end of Computer Science, I am also a passionate |
38 <H4>Note that besides being a lecturer in the theoretical part of Computer Science, I am also a passionate |
39 <A HREF="http://en.wikipedia.org/wiki/Hacker_(programmer_subculture)">hacker</A> … |
39 <A HREF="http://en.wikipedia.org/wiki/Hacker_(programmer_subculture)">hacker</A> … |
40 defined as “a person who enjoys exploring the details of programmable systems and |
40 defined as “a person who enjoys exploring the details of programmable systems and |
41 stretching their capabilities, as opposed to most users, who prefer to learn only the minimum |
41 stretching their capabilities, as opposed to most users, who prefer to learn only the minimum |
42 necessary.” I am always happy to supervise like-minded students.</H4> |
42 necessary.” I am always happy to supervise like-minded students.</H4> |
43 |
43 |
48 <B>Description:</b> |
48 <B>Description:</b> |
49 <A HREF="http://en.wikipedia.org/wiki/Regular_expression">Regular expressions</A> |
49 <A HREF="http://en.wikipedia.org/wiki/Regular_expression">Regular expressions</A> |
50 are extremely useful for many text-processing tasks...finding patterns in texts, |
50 are extremely useful for many text-processing tasks...finding patterns in texts, |
51 lexing programs, syntax highlighting and so on. Given that regular expressions were |
51 lexing programs, syntax highlighting and so on. Given that regular expressions were |
52 introduced in 1950 by <A HREF="http://en.wikipedia.org/wiki/Stephen_Cole_Kleene">Stephen Kleene</A>, you might think |
52 introduced in 1950 by <A HREF="http://en.wikipedia.org/wiki/Stephen_Cole_Kleene">Stephen Kleene</A>, you might think |
53 regular expressions have since been studied to death. But you would definitely be mistaken: in fact they are still |
53 regular expressions have since been studied and implemented to death. But you would definitely be mistaken: in fact they are still |
54 an active research area. For example |
54 an active research area. For example |
55 <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/ppdp12-part-deriv-sub-match.pdf">this paper</A> |
55 <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/ppdp12-part-deriv-sub-match.pdf">this paper</A> |
56 about regular expression matching and partial derivatives was presented this summer at the international |
56 about regular expression matching and partial derivatives was presented this summer at the international |
57 PPDP'12 conference.</p> |
57 PPDP'12 conference. The task in this project is to implement the results from this paper.</p> |
58 |
58 |
59 <p>The background for this project is that some regular expressions are |
59 <p>The background for this project is that some regular expressions are |
60 "<A HREF="http://en.wikipedia.org/wiki/ReDoS#Examples">evil</A>" |
60 "<A HREF="http://en.wikipedia.org/wiki/ReDoS#Examples">evil</A>" |
61 and can "stab you in the back" according to |
61 and can "stab you in the back" according to |
62 this recent <A HREF="http://tech.blog.cueup.com/regular-expressions-will-stab-you-in-the-back">blog post</A>. |
62 this <A HREF="http://tech.blog.cueup.com/regular-expressions-will-stab-you-in-the-back">blog post</A>. |
63 For example, if you use in <A HREF="http://www.python.org">Python</A> or |
63 For example, if you use in <A HREF="http://www.python.org">Python</A> or |
64 in <A HREF="http://www.ruby-lang.org/en/">Ruby</A> (probably also in other mainstream programming languages) the |
64 in <A HREF="http://www.ruby-lang.org/en/">Ruby</A> (probably also in other mainstream programming languages) the |
65 innocently looking regular expression <code>a?{28}a{28}</code> and match it, say, against the string |
65 innocently looking regular expression <code>a?{28}a{28}</code> and match it, say, against the string |
66 <code>aaaaaaaaaaaaaaaaaaaaaaaaaaaa</code> (that is 28 <code>a</code>s), you will soon notice that your CPU usage goes to 100%. In fact, |
66 <code>aaaaaaaaaaaaaaaaaaaaaaaaaaaa</code> (that is 28 <code>a</code>s), you will soon notice that your CPU usage goes to 100%. In fact, |
67 Python and Ruby need approximately 30 seconds for matching this string. You can try it for yourself: |
67 Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself: |
68 <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/re.py">re.py</A> (Python version) and |
68 <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/re.py">re.py</A> (Python version) and |
69 <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/re-internal.rb">re.rb</A> |
69 <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/re-internal.rb">re.rb</A> |
70 (Ruby version). You can imagine an attacker |
70 (Ruby version). You can imagine an attacker |
71 mounting a nice <A HREF="http://en.wikipedia.org/wiki/Denial-of-service_attack">DoS attack</A> against |
71 mounting a nice <A HREF="http://en.wikipedia.org/wiki/Denial-of-service_attack">DoS attack</A> against |
72 your program if it contains such an "evil" regular expression. Actually |
72 your program if it contains such an "evil" regular expression. Actually |
73 <A HREF="http://www.scala-lang.org/">Scala</A> (and also Java) are almost immune from such |
73 <A HREF="http://www.scala-lang.org/">Scala</A> (and also Java) are almost immune from such |
74 attacks as they can deal with strings of up to 4,300 <code>a</code>s in less than a second. But if you scale |
74 attacks as they can deal with strings of up to 4,300 <code>a</code>s in less than a second. But if you scale |
75 the regular expression and string further to, say, 4,600 <code>a</code>s, you get a <code>StackOverflowError</code> |
75 the regular expression and string further to, say, 4,600 <code>a</code>s, then you get a <code>StackOverflowError</code> |
76 exception chrashing your program. |
76 potentially chrashing your program. |
77 </p> |
77 </p> |
78 |
78 |
79 <p> |
79 <p> |
80 On a rainy afternoon, I implemented |
80 On a rainy afternoon, I implemented |
81 <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/re3.scala">this</A> |
81 <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/re3.scala">this</A> |
96 |
96 |
97 <p> |
97 <p> |
98 Now the guys from the |
98 Now the guys from the |
99 <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/ppdp12-part-deriv-sub-match.pdf">PPDP'12-paper</A> mentioned |
99 <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/ppdp12-part-deriv-sub-match.pdf">PPDP'12-paper</A> mentioned |
100 above claim they are even faster than me and can deal with even more features of regular expressions |
100 above claim they are even faster than me and can deal with even more features of regular expressions |
101 (for example subexpression matching, which my rainy-afternoon matcher lacks). I am sure they thought |
101 (for example subexpression matching, which my rainy-afternoon matcher cannot). I am sure they thought |
102 about the problem much longer than a single afternoon. The task |
102 about the problem much longer than a single afternoon. The task |
103 in this project is to find out how good they actually are by implementing the results from their paper. |
103 in this project is to find out how good they actually are by implementing the results from their paper. |
104 Their approach is based on the concept of partial derivatives introduced in 1994 by |
104 Their approach is based on the concept of partial derivatives introduced in 1994 by |
105 <A HREF="http://reference.kfupm.edu.sa/content/p/a/partial_derivatives_of_regular_expressio_1319383.pdf">Valentin Antimirov</A>. |
105 <A HREF="http://reference.kfupm.edu.sa/content/p/a/partial_derivatives_of_regular_expressio_1319383.pdf">Valentin Antimirov</A>. |
106 I used them once myself in a <A HREF="http://www.inf.kcl.ac.uk/staff/urbanc/Publications/rexp.pdf">paper</A> |
106 I used them once myself in a <A HREF="http://www.inf.kcl.ac.uk/staff/urbanc/Publications/rexp.pdf">paper</A> |
107 in order to prove the <A HREF="http://en.wikipedia.org/wiki/Myhill–Nerode_theorem">Myhill-Nerode theorem</A>. |
107 in order to prove the <A HREF="http://en.wikipedia.org/wiki/Myhill–Nerode_theorem">Myhill-Nerode theorem</A>. |
108 So I know they are worth their money. Still, it would be interesting to actually compare their results |
108 So I know they are worth their money. Still, it would be interesting to actually compare their results |
109 with my simple rainy-afternoon matcher and "blow away" the regular expression matchers in Python and Ruby (and possibly |
109 with my simple rainy-afternoon matcher and potentially "blow away" the regular expression matchers |
110 in Scala too). |
110 in Python and Ruby (and possibly in Scala too). |
111 </p> |
111 </p> |
112 |
112 |
113 <p> |
113 <p> |
114 <B>Literature:</B> |
114 <B>Literature:</B> |
115 The place to start with this project is obviously this |
115 The place to start with this project is obviously this |
141 |
141 |
142 |
142 |
143 <li> <H4>[CU2] Automata Theory in Your Web-Browser</H4> |
143 <li> <H4>[CU2] Automata Theory in Your Web-Browser</H4> |
144 |
144 |
145 <p> |
145 <p> |
|
146 There are a number of classic algorithms in automata theory (such as the transformation of regular |
|
147 expressions into NFAs and DFAs, automata minimisation, subset construction). All these algorithms involve a fair |
|
148 amount of calculations, which cannot be easily done by hand. There are a few web applications that annimate these |
|
149 calculations, for example <A HREF="http://hackingoff.com/compilers/regular-expression-to-nfa-dfa">this one<A/>. |
|
150 </p> |
|
151 |
|
152 <p> |
|
153 There now many useful libraries for Javascript, for example, this one for graphs. There are also |
|
154 a number of new programming languages targetting Javascript. This project is for someone who |
|
155 want to get to know these languges by implementing and animating algorithms from automata |
|
156 theory or parsing. |
|
157 </p> |
|
158 |
|
159 <B>Literature:</B> |
|
160 The same general literature as in [CU1]. |
|
161 </p> |
|
162 |
|
163 <p> |
146 <B>Skills:</B> |
164 <B>Skills:</B> |
147 This is a project for a student with good programming skills. |
165 This is a project for a student with good programming skills. |
148 JavaScript or a similar web-programming language seems to be best suited |
166 JavaScript or a similar web-programming language seems to be best suited |
149 for this project. Some knowledge in HTML and CSS cannot hurd either. |
167 for this project. Some knowledge in HTML and CSS cannot hurt either. |
150 </p> |
168 </p> |
151 |
169 |
152 |
170 |
153 <!-- |
171 <!-- |
154 <li> <H4>[CU2] Equivalence Checking of Regular Expressions</H4> |
172 <li> <H4>[CU2] Equivalence Checking of Regular Expressions</H4> |
194 |
212 |
195 <li> <H4>[CU3] Machine Code Generation for a Simple Compiler</H4> |
213 <li> <H4>[CU3] Machine Code Generation for a Simple Compiler</H4> |
196 |
214 |
197 <p> |
215 <p> |
198 <b>Description:</b> |
216 <b>Description:</b> |
199 Compilers translate high-level programs that humans can read and write into |
217 Compilers translate high-level programs that humans can read into |
200 efficient machine code that can be run on a CPU or virtual machine. |
218 efficient machine code that can be run on a CPU or virtual machine. |
201 I recently implemented a very simple compiler for a very simple functional |
219 I recently implemented a very simple compiler for a very simple functional |
202 programming language following this |
220 programming language following this |
203 <A HREF="http://www.cs.princeton.edu/~dpw/papers/tal-toplas.pdf">paper</A> |
221 <A HREF="http://www.cs.princeton.edu/~dpw/papers/tal-toplas.pdf">paper</A> |
204 (also described <A HREF="http://www.cs.princeton.edu/~dpw/papers/tal-tr.pdf">here</A>). |
222 (also described <A HREF="http://www.cs.princeton.edu/~dpw/papers/tal-tr.pdf">here</A>). |
205 My code, written in <A HREF="http://www.scala-lang.org/">Scala</A>, of this compiler is |
223 My code, written in <A HREF="http://www.scala-lang.org/">Scala</A>, of this compiler is |
206 <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/compiler.scala">here</A>. |
224 <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/compiler.scala">here</A>. |
207 The compiler can deal with simple programs involving natural numbers, such |
225 The compiler can deal with simple programs involving natural numbers, such |
208 as Fibonacci numbers |
226 as Fibonacci |
209 or factorial (but it can be easily extended - that is not the point). |
227 or factorial (but it can be easily extended - that is not the point). |
210 </p> |
228 </p> |
211 |
229 |
212 <p> |
230 <p> |
213 While the hard work has been done (understanding the two papers above), |
231 While the hard work has been done (understanding the two papers above), |
337 <A HREF="http://en.wikipedia.org/wiki/Electronic_voting">electronic voting</A>, |
355 <A HREF="http://en.wikipedia.org/wiki/Electronic_voting">electronic voting</A>, |
338 which essentially is still an unsolved problem in Computer Science. The |
356 which essentially is still an unsolved problem in Computer Science. The |
339 students only need to be prevented from answering question more than once thus skewing |
357 students only need to be prevented from answering question more than once thus skewing |
340 any statistics. Unlike electronic voting, no audit trail needs to be kept |
358 any statistics. Unlike electronic voting, no audit trail needs to be kept |
341 for student polling. Restricting the number of answers can probably be solved |
359 for student polling. Restricting the number of answers can probably be solved |
342 by setting appropriate cookies on the students |
360 by setting appropriate cookies on the students' |
343 computers or smart phones. |
361 computers or smart phones. |
|
362 </p> |
|
363 |
|
364 <p> |
|
365 However, there is one restriction that makes this project harder than it seems |
|
366 as first sight. The department does not allow large server applications and databases |
|
367 to be run on calcium. So the problem should be solved with as few resources needed |
|
368 on the "back-end" which collects the votes. |
344 </p> |
369 </p> |
345 |
370 |
346 <p> |
371 <p> |
347 <B>Literature:</B> |
372 <B>Literature:</B> |
348 The project requires fluency in a web-programming language (for example |
373 The project requires fluency in a web-programming language (for example |
349 <A HREF="http://en.wikipedia.org/wiki/JavaScript">Javascript</A>, |
374 <A HREF="http://en.wikipedia.org/wiki/JavaScript">Javascript</A>, |
350 <A HREF="http://en.wikipedia.org/wiki/PHP">PHP</A>, |
375 <A HREF="http://en.wikipedia.org/wiki/PHP">PHP</A>, |
351 Java, <A HREF="http://www.python.org">Python</A>, |
376 Java, <A HREF="http://www.python.org">Python</A>, |
352 <A HREF="http://en.wikipedia.org/wiki/Go_(programming_language)">Go</A>, |
377 <A HREF="http://en.wikipedia.org/wiki/Go_(programming_language)">Go</A>, |
353 <A HREF="http://www.scala-lang.org/">Scala</A>, |
378 <A HREF="http://www.scala-lang.org/">Scala</A>, |
354 <A HREF="http://en.wikipedia.org/wiki/Ruby_(programming_language)">Ruby</A>) |
379 <A HREF="http://en.wikipedia.org/wiki/Ruby_(programming_language)">Ruby</A>). |
355 and possibly a cloud application platform (for example |
|
356 <A HREF="https://developers.google.com/appengine/">Google App Engine</a> or |
|
357 <A HREF="http://www.heroku.com">Heroku</A>). |
|
358 For web-programming the |
380 For web-programming the |
359 <A HREF="http://www.udacity.com/overview/Course/cs253/CourseRev/apr2012">Web Application Engineering</A> |
381 <A HREF="http://www.udacity.com/overview/Course/cs253/CourseRev/apr2012">Web Application Engineering</A> |
360 course at <A HREF="http://www.udacity.com">Udacity</A> is a good starting point |
382 course at <A HREF="http://www.udacity.com">Udacity</A> is a good starting point |
361 to be aware of the issues involved. This course uses <A HREF="http://www.python.org">Python</A>. |
383 to be aware of the issues involved. This course uses <A HREF="http://www.python.org">Python</A>. |
362 To evaluate the answers from the student, Google's |
384 To evaluate the answers from the student, Google's |