# HG changeset patch
# User Christian Urban
Description:
@@ -56,15 +56,15 @@
mistaken: in fact they are still an active research area. For example
this paper
about regular expression matching and partial derivatives was presented last summer at the international
- PPDP'12 conference. They even work on a followup paper that has not yet been presented at any
- conference. The task in this project is to implement their results.
-
[CU1] Regular Expression Matching and Partial Derivatives
+[CU1] Regular Expression Matching and Derivatives
The background for this project is that some regular expressions are
“evil”
and can “stab you in the back” according to
this blog post.
For example, if you use in Python or
- in Ruby (probably also other mainstream programming languages) the
+ in Ruby (or also a number of other mainstream programming languages according to this
+ blog) the
innocently looking regular expression a?{28}a{28}
and match it, say, against the string
aaaaaaaaaaaaaaaaaaaaaaaaaaaa
(that is 28 a
s), you will soon notice that your CPU usage goes to 100%. In fact,
Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself:
@@ -76,7 +76,9 @@
Scala (and also Java) are almost immune from such
attacks as they can deal with strings of up to 4,300 a
s in less than a second. But if you scale
the regular expression and string further to, say, 4,600 a
s, then you get a StackOverflowError
- potentially crashing your program.
+ potentially crashing your program. Moreover (beside the problem of being painfully slow) according to this
+ report
+ nearly all POSIX regular expression matchers are actually buggy.
@@ -99,18 +101,19 @@
Now the authors from the - PPDP'12-paper mentioned + FLOPS'14-paper mentioned above claim they are even faster than me and can deal with even more features of regular expressions (for example subexpression matching, which my rainy-afternoon matcher cannot). I am sure they thought about the problem much longer than a single afternoon. The task in this project is to find out how good they actually are by implementing the results from their paper. - Their approach is based on the concept of partial derivatives introduced in 1994 by + Their approach is based on the concept of derivatives introduced in 1994 by Valentin Antimirov. I used them once myself in a paper in order to prove the Myhill-Nerode theorem. So I know they are worth their money. Still, it would be interesting to actually compare their results with my simple rainy-afternoon matcher and potentially “blow away” the regular expression matchers - in Python and Ruby (and possibly in Scala too). + in Python and Ruby (and possibly in Scala too). The application would be to implement a fast lexer for + programming languages.
@@ -131,6 +134,7 @@ online book very helpful. Test cases for “evil” regular expressions can be obtained from here. +
@@ -179,7 +183,7 @@ JavaScript. This is the language that is supported by most browsers and therefore is a favourite vehicle for Web-programming. Some call it the scripting language of the Web. - Unfortunately, JavaScript is probably one of the worst + Unfortunately, JavaScript is also probably one of the worst languages to program in (being designed and released in a hurry). But it can be used as a convenient target for translating programs from other languages. In particular there are two very optimised subsets of JavaScript that can be used for this purpose: @@ -239,52 +243,11 @@
- PS: Compiler projects, like this [CU2] and [CU3], consistently received high marks in the past. + PS: Compiler projects, like this [CU2] and [CU6], consistently received high marks in the past. I suprvised four so far and none of them received a mark below 70% - one even was awarded a prize.
-- Description: - JavaScript is a language that is supported by most - browsers and therefore is a favourite - vehicle for Web-programming. Some call it the scripting language of the Web. - Unfortunately, JavaScript is probably one of the worst - languages to program in (being designed and released in a hurry). But it can be used as a convenient target - for translating programs from other languages. In particular there are two - very optimised subsets of JavaScript that can be used for this purpose: - one is asm.js and the other is - emscripten. - There is a tutorial for emscripten - and an impressive demo which runs the - Unreal Engine 3 - in a browser with spectacular speed. This was achieved by compiling the - C-code of the Unreal Engine to the LLVM intermediate language and then translating the LLVM - code to JavaScript. -
- -- Skills: - This project is about exploring these two subsets of JavaScript and implement a translator - of a small language into them. This is similar to the project [CU2] above and requires - similar skills. In addition it would be good to have already some familiarity with JavaScript. - There are plenty of tutorials on the Web. - Here is a list of free books on JavaScript. - This is a project for a student who wants to get more familiar with JavaScript and Web-programming. - A project from which you can draw inspiration is this - List-to-JavaScript - translator. Here is another such project. - And another in less than 100 lines of code. - Coffeescript is a similar project - except that it is already quite mature. And finally not to - forget TypeScript developed by Microsoft. The main - difference between these projects and this one is that they translate into relatively high-level - JavaScript code; none of them use the much lower levels asm.js and - emscripten. -
- -The standard technology for writing scientific papers in Computer Science is to use @@ -345,10 +308,13 @@ parse a language and translate it to a suitable part of JavaScript using appropriate libraries. Tutorials for JavaScript are here. A parser generator for JavaScript is here. There are probably also - others. + others. If you want to avoid JavaScript there are a number of alternatives: for example the + Elm + language has been especially designed for implementing easily interactive animations, which would be + very conveninet for this project.
-Description: @@ -416,77 +382,14 @@
Skills: - In order to provide convenience for the lecturer, this project needs very good web-programming skills. A + In order to provide convenience for the lecturer, this project needs very good web-programming skills. A hacker mentality - (see above) is probably very beneficial: web-programming is an area that only emerged recently and + (see above) is probably also very beneficial: web-programming is an area that only emerged recently and many tools still lack maturity. You probably have to experiment a lot with several different languages and tools.
-- Description: - The project aim is to implement an infrastructure for displaying and - animating code in a web-browser. The infrastructure should be agnostic - with respect to the programming language, but should be configurable. - I envisage something smaller than the projects - here (for Python), - here (for Java), - here (for multiple languages), - here (for HTML) - here (for JavaScript), - and here (for Scala). -
- -- The tasks in this project are being able (1) to lex and parse languages and (2) to write an interpreter. - The goal is to implement this as much as possible in a language-agnostic fashion. -
- -- Skills: - Good skill in lexing and language parsing, as well as being fluent with web programming (for - example JavaScript). -
- - -- Description: - There are many algorithms for synchronising clocks. This - paper - describes a new algorithm for clocks that communicate by exchanging - messages and thereby reach a state in which (within some bound) all clocks are synchronised. - A slightly longer and more detailed paper about the algorithm is - here. - The point of this project is to implement this algorithm and simulate networks of clocks. -
- -- Literature: - There is a wide range of literature on clock synchronisation algorithms. - Some pointers are given in this - paper, - which describes the algorithm to be implemented in this project. Pointers - are given also here. -
- -- Skills: - In order to implement a simulation of a network of clocks, you need to tackle - concurrency. You can do this for example in the programming language - Scala with the help of the - Akka library. This library enables you to send messages - between different actors. Here - are some examples that explain how to implement exchanging messages between actors. -
- - - - -Description: @@ -501,7 +404,7 @@ floods of online material). Google just released a framework - for web-programming and for turning Raspberry Pi's into webservers. + for web-programming on Raspberry Pi's truning them into webservers.
@@ -560,6 +463,109 @@
++ Description: + JavaScript is a language that is supported by most + browsers and therefore is a favourite + vehicle for Web-programming. Some call it the scripting language of the Web. + Unfortunately, JavaScript is probably one of the worst + languages to program in (being designed and released in a hurry). But it can be used as a convenient target + for translating programs from other languages. In particular there are two + very optimised subsets of JavaScript that can be used for this purpose: + one is asm.js and the other is + emscripten. + There is a tutorial for emscripten + and an impressive demo which runs the + Unreal Engine 3 + in a browser with spectacular speed. This was achieved by compiling the + C-code of the Unreal Engine to the LLVM intermediate language and then translating the LLVM + code to JavaScript. +
+ ++ Skills: + This project is about exploring these two subsets of JavaScript and implement a translator + of a small language into them. This is similar to the project [CU2] above and requires + similar skills. In addition it would be good to have already some familiarity with JavaScript. + There are plenty of tutorials on the Web. + Here is a list of free books on JavaScript. + This is a project for a student who wants to get more familiar with JavaScript and Web-programming. + A project from which you can draw inspiration is this + List-to-JavaScript + translator. Here is another such project. + And another in less than 100 lines of code. + Coffeescript is a similar project + except that it is already quite mature. And finally not to + forget TypeScript developed by Microsoft. The main + difference between these projects and this one is that they translate into relatively high-level + JavaScript code; none of them use the much lower levels asm.js and + emscripten. +
+ + + ++ Description: + The project aim is to implement an infrastructure for displaying and + animating code in a web-browser. The infrastructure should be agnostic + with respect to the programming language, but should be configurable. + I envisage something smaller than the projects + here (for Python), + here (for Java), + here (for multiple languages), + here (for HTML) + here (for JavaScript), + and here (for Scala). +
+ ++ The tasks in this project are being able (1) to lex and parse languages and (2) to write an interpreter. + The goal is to implement this as much as possible in a language-agnostic fashion. +
+ ++ Skills: + Good skill in lexing and language parsing, as well as being fluent with web programming (for + example JavaScript). +
+ + ++ Description: + There are many algorithms for synchronising clocks. This + paper + describes a new algorithm for clocks that communicate by exchanging + messages and thereby reach a state in which (within some bound) all clocks are synchronised. + A slightly longer and more detailed paper about the algorithm is + here. + The point of this project is to implement this algorithm and simulate networks of clocks. +
+ ++ Literature: + There is a wide range of literature on clock synchronisation algorithms. + Some pointers are given in this + paper, + which describes the algorithm to be implemented in this project. Pointers + are given also here. +
+ ++ Skills: + In order to implement a simulation of a network of clocks, you need to tackle + concurrency. You can do this for example in the programming language + Scala with the help of the + Akka library. This library enables you to send messages + between different actors. Here + are some examples that explain how to implement exchanging messages between actors. +
+@@ -606,7 +612,7 @@
- Last modified: Tue Aug 5 17:37:49 BST 2014 + Last modified: Fri Sep 19 11:11:36 BST 2014 [Validate this page.]