1 <?xml version="1.0" encoding="utf-8"?> |
1 <?xml version="1.0" encoding="utf-8"?> |
2 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
2 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
3 <HEAD> |
3 <HEAD> |
4 <TITLE>2015/16 MSc Projects</TITLE> |
4 <TITLE>2016/17 BSc Projects</TITLE> |
5 <BASE HREF="http://www.inf.kcl.ac.uk/staff/urbanc/"> |
5 <BASE HREF="http://www.inf.kcl.ac.uk/staff/urbanc/"> |
6 <script type="text/javascript" src="striper.js"></script> |
6 <script type="text/javascript" src="striper.js"></script> |
7 <link rel="stylesheet" href="nominal.css"> |
7 <link rel="stylesheet" href="nominal.css"> |
8 <script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> |
8 <script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> |
9 </script> |
9 </script> |
33 <H2>2015/16 MSc Projects</H2> |
33 <H2>2015/16 MSc Projects</H2> |
34 <H4>Supervisor: Christian Urban</H4> |
34 <H4>Supervisor: Christian Urban</H4> |
35 <H4>Email: christian dot urban at kcl dot ac dot uk, Office: Strand Building S1.27</H4> |
35 <H4>Email: christian dot urban at kcl dot ac dot uk, Office: Strand Building S1.27</H4> |
36 <H4>If you are interested in a project, please send me an email and we can discuss details. Please include |
36 <H4>If you are interested in a project, please send me an email and we can discuss details. Please include |
37 a short description about your programming skills and Computer Science background in your first email. |
37 a short description about your programming skills and Computer Science background in your first email. |
38 I will also need your King's username in order to book the project for you. Thanks.</H4> |
38 Thanks.</H4> |
39 |
39 |
40 <H4>Note that besides being a lecturer at the theoretical end of Computer Science, I am also a passionate |
40 <H4>Note that besides being a lecturer at the theoretical end of Computer Science, I am also a passionate |
41 <A HREF="http://en.wikipedia.org/wiki/Hacker_(programmer_subculture)">hacker</A> … |
41 <A HREF="http://en.wikipedia.org/wiki/Hacker_(programmer_subculture)">hacker</A> … |
42 defined as “a person who enjoys exploring the details of programmable systems and |
42 defined as “a person who enjoys exploring the details of programmable systems and |
43 stretching their capabilities, as opposed to most users, who prefer to learn only the minimum |
43 stretching their capabilities, as opposed to most users, who prefer to learn only the minimum |
56 <A HREF="http://en.wikipedia.org/wiki/Regular_expression">Regular expressions</A> |
56 <A HREF="http://en.wikipedia.org/wiki/Regular_expression">Regular expressions</A> |
57 are extremely useful for many text-processing tasks, such as finding patterns in texts, |
57 are extremely useful for many text-processing tasks, such as finding patterns in texts, |
58 lexing programs, syntax highlighting and so on. Given that regular expressions were |
58 lexing programs, syntax highlighting and so on. Given that regular expressions were |
59 introduced in 1950 by <A HREF="http://en.wikipedia.org/wiki/Stephen_Cole_Kleene">Stephen Kleene</A>, |
59 introduced in 1950 by <A HREF="http://en.wikipedia.org/wiki/Stephen_Cole_Kleene">Stephen Kleene</A>, |
60 you might think regular expressions have since been studied and implemented to death. But you would definitely be |
60 you might think regular expressions have since been studied and implemented to death. But you would definitely be |
61 mistaken: in fact they are still an active research area. For example |
61 mistaken: in fact they are still an active research area. On the top of my head, I can give |
|
62 you at least research papers that appeared in the last few years. |
|
63 For example |
62 <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/regex-parsing-derivatives.pdf">this paper</A> |
64 <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/regex-parsing-derivatives.pdf">this paper</A> |
63 about regular expression matching and derivatives was presented just last summer at the international |
65 about regular expression matching and derivatives was presented just last summer at the international |
64 FLOPS'14 conference. The task in this project is to implement their results and use them for lexing.</p> |
66 FLOPS'14 conference. The task in this project is to implement their results and use them for lexing.</p> |
65 |
67 |
66 <p>The background for this project is that some regular expressions are |
68 <p>The background for this project is that some regular expressions are |
70 For example, if you use in <A HREF="http://www.python.org">Python</A> or |
72 For example, if you use in <A HREF="http://www.python.org">Python</A> or |
71 in <A HREF="http://www.ruby-lang.org/en/">Ruby</A> (or also in a number of other mainstream programming languages) the |
73 in <A HREF="http://www.ruby-lang.org/en/">Ruby</A> (or also in a number of other mainstream programming languages) the |
72 innocently looking regular expression <code>a?{28}a{28}</code> and match it, say, against the string |
74 innocently looking regular expression <code>a?{28}a{28}</code> and match it, say, against the string |
73 <code>aaaaaaaaaaaaaaaaaaaaaaaaaaaa</code> (that is 28 <code>a</code>s), you will soon notice that your CPU usage goes to 100%. In fact, |
75 <code>aaaaaaaaaaaaaaaaaaaaaaaaaaaa</code> (that is 28 <code>a</code>s), you will soon notice that your CPU usage goes to 100%. In fact, |
74 Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself: |
76 Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself: |
75 <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/progs/re.py">re.py</A> (Python version) and |
77 <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/progs/catastrophic.py">catastrophic.py</A> (Python version) and |
76 <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/progs/re.rb">re.rb</A> |
78 <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/progs/catastrophic.rb">catastrophic.rb</A> |
77 (Ruby version). You can imagine an attacker |
79 (Ruby version). Here is a similar problem in Java: <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/progs/catastrophic.rb">catastrophic.java</A> |
|
80 </p> |
|
81 |
|
82 <p> |
|
83 You can imagine an attacker |
78 mounting a nice <A HREF="http://en.wikipedia.org/wiki/Denial-of-service_attack">DoS attack</A> against |
84 mounting a nice <A HREF="http://en.wikipedia.org/wiki/Denial-of-service_attack">DoS attack</A> against |
79 your program if it contains such an “evil” regular expression. Actually |
85 your program if it contains such an “evil” regular expression. But it can also happen by accident: |
80 <A HREF="http://www.scala-lang.org/">Scala</A> (and also Java) are almost immune from such |
86 on 20 July 2016 the website <A HREF="http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016">Stack Exchange</A> |
81 attacks as they can deal with strings of up to 4,300 <code>a</code>s in less than a second. But if you scale |
87 was knocked offline because of an evil regular expression. One of their engineers talks about it in this |
|
88 <A HREF="https://vimeo.com/112065252">video</A>. A similar problem needed to be fixed in the |
|
89 <A HREF="http://davidvgalbraith.com/how-i-fixed-atom/">Atom</A> editor. |
|
90 A few implementations of regular expression matchers are almost immune from such problems. |
|
91 For example, <A HREF="http://www.scala-lang.org/">Scala</A> can deal with strings of up to 4,300 <code>a</code>s in less than a second. But if you scale |
82 the regular expression and string further to, say, 4,600 <code>a</code>s, then you get a <code>StackOverflowError</code> |
92 the regular expression and string further to, say, 4,600 <code>a</code>s, then you get a <code>StackOverflowError</code> |
83 potentially crashing your program. Moreover (beside the "minor" problem of being painfully slow) according to this |
93 potentially crashing your program. Moreover (beside the "minor" problem of being painfully slow) according to this |
84 <A HREF="http://www.haskell.org/haskellwiki/Regex_Posix">report</A> |
94 <A HREF="http://www.haskell.org/haskellwiki/Regex_Posix">report</A> |
85 nearly all regular expression matchers using the POSIX rules are actually buggy. |
95 nearly all regular expression matchers using the POSIX rules are actually buggy. |
86 </p> |
96 </p> |
94 official matcher maxes out at 4,600 <code>a</code>s). My matcher is approximately |
104 official matcher maxes out at 4,600 <code>a</code>s). My matcher is approximately |
95 85 lines of code and based on the concept of |
105 85 lines of code and based on the concept of |
96 <A HREF="http://lambda-the-ultimate.org/node/2293">derivatives of regular expressions</A>. |
106 <A HREF="http://lambda-the-ultimate.org/node/2293">derivatives of regular expressions</A>. |
97 These derivatives were introduced in 1964 by <A HREF="http://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)"> |
107 These derivatives were introduced in 1964 by <A HREF="http://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)"> |
98 Janusz Brzozowski</A>, but according to this |
108 Janusz Brzozowski</A>, but according to this |
99 <A HREF="http://www.cl.cam.ac.uk/~so294/documents/jfp09.pdf">paper</A> had been lost in the “sands of time”. |
109 <A HREF="https://www.cs.kent.ac.uk/people/staff/sao/documents/jfp09.pdf">paper</A> had been lost in the “sands of time”. |
100 The advantage of derivatives is that they side-step completely the usual |
110 The advantage of derivatives is that they side-step completely the usual |
101 <A HREF="http://hackingoff.com/compilers/regular-expression-to-nfa-dfa">translations</A> of regular expressions |
111 <A HREF="http://hackingoff.com/compilers/regular-expression-to-nfa-dfa">translations</A> of regular expressions |
102 into NFAs or DFAs, which can introduce the exponential behaviour exhibited by the regular |
112 into NFAs or DFAs, which can introduce the exponential behaviour exhibited by the regular |
103 expression matchers in Python and Ruby. |
113 expression matchers in Python and Ruby. |
104 </p> |
114 </p> |
122 |
132 |
123 <p> |
133 <p> |
124 <B>Literature:</B> |
134 <B>Literature:</B> |
125 The place to start with this project is obviously this |
135 The place to start with this project is obviously this |
126 <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/regex-parsing-derivatives.pdf">paper</A> |
136 <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/regex-parsing-derivatives.pdf">paper</A> |
127 and this <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/ppdp12-part-deriv-sub-match.pdf">one</A>. |
137 and this <A HREF="http://www.inf.kcl.ac.uk/staff/urbanc/Publications/posix.pdf">one</A>. |
128 Traditional methods for regular expression matching are explained |
138 Traditional methods for regular expression matching are explained |
129 in the Wikipedia articles |
139 in the Wikipedia articles |
130 <A HREF="http://en.wikipedia.org/wiki/DFA_minimization">here</A> and |
140 <A HREF="http://en.wikipedia.org/wiki/DFA_minimization">here</A> and |
131 <A HREF="http://en.wikipedia.org/wiki/Powerset_construction">here</A>. |
141 <A HREF="http://en.wikipedia.org/wiki/Powerset_construction">here</A>. |
132 The authoritative <A HREF="http://infolab.stanford.edu/~ullman/ialc.html">book</A> |
142 The authoritative <A HREF="http://infolab.stanford.edu/~ullman/ialc.html">book</A> |
148 in functional languages like |
158 in functional languages like |
149 <A HREF="http://www.scala-lang.org/">Scala</A>, |
159 <A HREF="http://www.scala-lang.org/">Scala</A>, |
150 <A HREF="http://fsharp.org">F#</A>, |
160 <A HREF="http://fsharp.org">F#</A>, |
151 <A HREF="http://en.wikipedia.org/wiki/Standard_ML">ML</A>, |
161 <A HREF="http://en.wikipedia.org/wiki/Standard_ML">ML</A>, |
152 <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>, etc. Python and other non-functional languages |
162 <A HREF="http://haskell.org/haskellwiki/Haskell">Haskell</A>, etc. Python and other non-functional languages |
153 can be also used, but seem much less convenient. If you attend my Formal Languages and |
163 can be also used, but seem much less convenient. If you attend my Compilers and Formal Languages |
154 Automata module, that would obviously give you a head-start with this project. |
164 module, that would obviously give you a head-start with this project. |
155 </p> |
165 </p> |
156 |
166 |
157 <li> <H4>[CU2] A Compiler for a small Programming Language</H4> |
167 <li> <H4>[CU2] A Compiler for a small Programming Language</H4> |
158 |
168 |
159 <p> |
169 <p> |