# HG changeset patch # User Christian Urban # Date 1535576985 -3600 # Node ID 2abaef1458e9ef28f70ce8a7ce870a34a43fba0c # Parent 10e59469ecf5ae00ed9c56ddd4e907a551e8c6aa added projects diff -r 10e59469ecf5 -r 2abaef1458e9 bsc-projects-18.html --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/bsc-projects-18.html Wed Aug 29 22:09:45 2018 +0100 @@ -0,0 +1,595 @@ + + + +2018/19 BSc Projects + + + + + + + + + + + + + + +
+ +

2018/19 BSc Projects

+

Supervisor: Christian Urban

+

Email: christian dot urban at kcl dot ac dot uk, Office: Bush House N7.07

+

If you are interested in a project, please send me an email and we can discuss details. Please include +a short description about your programming skills and Computer Science background in your first email. +Thanks.

+ +

Note that besides being a lecturer at the theoretical end of Computer Science, I am also a passionate + hacker … + defined as “a person who enjoys exploring the details of programmable systems and + stretching their capabilities, as opposed to most users, who prefer to learn only the minimum + necessary.” I am always happy to supervise like-minded students. +

+ +

In 2013/14, I was nominated by the students + for the best BSc project supervisor and best MSc project supervisor awards in the NMS + faculty. Somehow I won both. In 2014/15 I was nominated again for the best MSc + project supervisor, but did not win it. ;o) +

+ +
    +
  • [CU1] Regular Expressions, Lexing and Derivatives

    + +

    + Description: + Regular expressions + are extremely useful for many text-processing tasks, such as finding patterns in hostile network traffic, + lexing programs, syntax highlighting and so on. Given that regular expressions were + introduced in 1950 by Stephen Kleene, + you might think regular expressions have since been studied and implemented to death. But you would definitely be + mistaken: in fact they are still an active research area. On the top of my head, I can give + you at least ten research papers that appeared in the last few years. + For example + this paper + about regular expression matching and derivatives was presented in 2014 at the international + FLOPS conference. Another paper by my PhD student and me was presented in 2016 + at the international ITP conference. + The task in this project is to implement these results and use them for lexing.

    + +

    The background for this project is that some regular expressions are + “evil” + and can “stab you in the back” according to + this blog post. + For example, if you use in Python or + in Ruby (or also in a number of other mainstream programming languages) the + innocently looking regular expression a?{28}a{28} and match it, say, against the string + aaaaaaaaaaaaaaaaaaaaaaaaaaaa (that is 28 as), you will soon notice that your CPU usage goes to 100%. In fact, + Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself: + catastrophic.py (Python version) and + catastrophic.rb + (Ruby version). Here is a similar problem in Java: catastrophic.java +

    + +

    + You can imagine an attacker + mounting a nice DoS attack against + your program if it contains such an “evil” regular expression. But it can also happen by accident: + on 20 July 2016 the website Stack Exchange + was knocked offline because of an evil regular expression. One of their engineers talks about this in this + video. A similar problem needed to be fixed in the + Atom editor. + A few implementations of regular expression matchers are almost immune from such problems. + For example, Scala can deal with strings of up to 4,300 as in less than a second. But if you scale + the regular expression and string further to, say, 4,600 as, then you get a StackOverflowError + potentially crashing your program. Moreover (beside the "minor" problem of being painfully slow) according to this + report + nearly all regular expression matchers using the POSIX rules are actually buggy. +

    + +

    + On a rainy afternoon, I implemented + this + regular expression matcher in Scala. It is not as fast as the official one in Scala, but + it can match up to 11,000 as in less than 5 seconds without raising any exception + (remember Python and Ruby both need nearly 30 seconds to process 28(!) as, and Scala's + official matcher maxes out at 4,600 as). My matcher is approximately + 85 lines of code and based on the concept of + derivatives of regular expressions. + These derivatives were introduced in 1964 by + Janusz Brzozowski, but according to this + paper had been lost in the “sands of time”. + The advantage of derivatives is that they side-step completely the usual + translations of regular expressions + into NFAs or DFAs, which can introduce the exponential behaviour exhibited by the regular + expression matchers in Python, Java and Ruby. +

    + +

    + Now the authors from the + FLOPS'14-paper mentioned + above claim they are even faster than me and can deal with even more features of regular expressions + (for example subexpression matching, which my rainy-afternoon matcher cannot). I am sure they thought + about the problem much longer than a single afternoon. The task + in this project is to find out how good they actually are by implementing the results from their paper. + Their approach to regular expression matching is also based on the concept of derivatives. + I used derivatives very successfully once for something completely different in a + paper + about the Myhill-Nerode theorem. + So I know they are worth their money. Still, it would be interesting to actually compare their results + with my simple rainy-afternoon matcher and potentially “blow away” the regular expression matchers + in Python, Ruby and Java (and possibly in Scala too). The application would be to implement a fast lexer for + programming languages, or improve the network traffic analysers in the tools Snort and + Bro. +

    + +

    + Literature: + The place to start with this project is obviously this + paper + and this one. + Traditional methods for regular expression matching are explained + in the Wikipedia articles + here and + here. + The authoritative book + on automata and regular expressions is by John Hopcroft and Jeffrey Ullmann (available in the library). + There is also an online course about this topic by Ullman at + Coursera, though IMHO not + done with love. + There are millions of other pointers about regular expression + matching on the Web. I found the chapter on Lexing in this + online book very helpful. Finally, it will + be of great help for this project to take part in my Compiler and Formal Language module (6CCS3CFL). + Test cases for “evil” + regular expressions can be obtained from here. +

    + +

    + Skills: + This is a project for a student with an interest in theory and with + good programming skills. The project can be easily implemented + in functional languages like + Scala, + F#, + ML, + Haskell, etc. Python and other non-functional languages + can be also used, but seem much less convenient. If you do attend my Compilers and Formal Languages + module, that would obviously give you a head-start with this project. +

    + +
  • [CU2] A Compiler for a small Programming Language

    + +

    + Description: + Compilers translate high-level programs that humans can read and write into + efficient machine code that can be run on a CPU or virtual machine. + A compiler for a simple functional language generating X86 code is described + here. + I recently implemented a very simple compiler for an even simpler functional + programming language following this + paper + (also described here). + My code, written in Scala, of this compiler is + here. + The compiler can deal with simple programs involving natural numbers, such + as Fibonacci numbers or factorial (but it can be easily extended - that is not the point). +

    + +

    + While the hard work has been done (understanding the two papers above), + my compiler only produces some idealised machine code. For example I + assume there are infinitely many registers. The goal of this + project is to generate machine code that is more realistic and can + run on a CPU, like X86, or run on a virtual machine, say the JVM. + This gives probably a speedup of thousand times in comparison to + my naive machine code and virtual machine. The project + requires to dig into the literature about real CPUs and generating + real machine code. +

    +

    + An alternative is to not generate machine code, but build a compiler that compiles to + JavaScript. This is the language that is supported by most + browsers and therefore is a favourite + vehicle for Web-programming. Some call it the scripting language of the Web. + Unfortunately, JavaScript is also probably one of the worst + languages to program in (being designed and released in a hurry). But it can be used as a convenient target + for translating programs from other languages. In particular there are two + very optimised subsets of JavaScript that can be used for this purpose: + one is asm.js and the other is + emscripten. Since + last year there is even the official Webassembly + There is a tutorial for emscripten + and an impressive demo which runs the + Unreal Engine 3 + in a browser with spectacular speed. This was achieved by compiling the + C-code of the Unreal Engine to the LLVM intermediate language and then translating the LLVM + code to JavaScript. +

    + +

    + Literature: + There is a lot of literature about compilers + (for example this book - + I can lend you my copy for the duration of the project, or this + online book). A very good overview article + about implementing compilers by + Laurie Tratt is + here. + An online book about the Art of Assembly Language is + here. + An introduction into x86 machine code is here. + Intel's official manual for the x86 instruction is + here. + Two assemblers for the JVM are described here + and here. + An interesting twist of this project is to not generate code for a CPU, but + for the intermediate language of the LLVM compiler + (also described here). If you want to see + what machine code looks like you can compile your C-program using gcc -S. +

    +

    + If JavaScript is chosen as a target instead, then there are plenty of tutorials on the Web. + Here is a list of free books on JavaScript. + A project from which you can draw inspiration is this + Lisp-to-JavaScript + translator. Here is another such project. + And another in less than 100 lines of code. + Coffeescript is a similar project + except that it is already quite mature. And finally not to + forget TypeScript developed by Microsoft. The main + difference between these projects and this one is that they translate into relatively high-level + JavaScript code; none of them use the much lower levels asm.js and + emscripten. +

    +

    + Skills: + This is a project for a student with a deep interest in programming languages and + compilers. Since my compiler is implemented in Scala, + it would make sense to continue this project in this language. I can be + of help with questions and books about Scala. + But if Scala is a problem, my code can also be translated quickly into any other functional + language. Again, it will be of great help for this project to take part in + my Compiler and Formal Language module (6CCS3CFL). +

    + +

    + PS: Compiler projects consistently received high marks in the past. + I have supervised eight so far and most of them received a mark above 70% - one even was awarded a prize. +

    + +
  • [CU3] Slide-Making in the Web-Age

    + +

    + The standard technology for writing scientific papers in Computer Science is to use + LaTeX, a document preparation + system originally implemented by Donald Knuth + and Leslie Lamport. + LaTeX produces very pleasantly looking documents, can deal nicely with mathematical + formulas and is very flexible. If you are interested, here + is a side-by-side comparison between Word and LaTeX (which LaTeX “wins” with 18 out of 21 points). + Computer scientists not only use LaTeX for documents, + but also for slides (really, nobody who wants to be cool uses Keynote or Powerpoint). +

    + +

    + Although used widely, LaTeX seems nowadays a bit dated for producing + slides. Unlike documents, which are typically “static” and published in a book or journal, + slides often contain changing contents that might first only be partially visible and + only later be revealed as the “story” of a talk or lecture demands. + Also slides often contain animated algorithms where each state in the + calculation is best explained by highlighting the changing data. +

    + +

    + It seems HTML and JavaScript are much better suited for generating + such animated slides. This page + links to slide-generating programs using this combination of technologies. + However, the problem with all of these project is that they depend heavily on the users being + able to write JavaScript, CCS or HTML...not something one would like to depend on given that + “normal” users likely only have a LaTeX background. The aim of this project is to invent a + very simple language that is inspired by LaTeX and then generate from code written in this language + slides that can be displayed in a web-browser. An example would be the + Madoko project. +

    + +

    + This sounds complicated, but there is already some help available: + Mathjax is a JavaScript library that can + be used to display mathematical text, for example

    + +
    +

    When \(a \ne 0\), there are two solutions to \(ax^2 + bx + c = 0\) and they are + \(x = {-b \pm \sqrt{b^2-4ac} \over 2a}\).

    +
    + +

    + by writing code in the familiar LaTeX-way. This can be reused. + Another such library is KaTeX. + There are also plenty of JavaScript + libraries for graphical animations (for example + Raphael, + SVG.JS, + Bonsaijs, + JSXGraph). The inspiration for how the user should be able to write + slides could come from the LaTeX packages Beamer + and PGF/TikZ. A slide-making project from which + inspiration can be drawn is hyhyhy. +

    + +

    + Skills: + This is a project that requires good knowledge of JavaScript. You need to be able to + parse a language and translate it to a suitable part of JavaScript using + appropriate libraries. Tutorials for JavaScript are here. + A parser generator for JavaScript is here. There are probably also + others. If you want to avoid JavaScript there are a number of alternatives: for example the + Elm + language has been especially designed for implementing interactive animations, which would be + very convenient for this project. A nice slide making project done by a previous student is + MarkSlides by Oleksandr Cherednychenko. +

    + +
  • [CU4] Raspberry Pi's and Arduinos

    + +

    + Description: + This project is for true hackers! Raspberry Pi's + are small Linux computers the size of a credit-card and only cost £26, the + simplest version even costs only £5 (see pictures on the left below). They were introduced + in 2012 and people went crazy...well some of them. There is a + Google+ + community about Raspberry Pi's that has more + than 197k of followers. It is hard to keep up with what people do with these small computers. The possibilities + seem to be limitless. The main resource for Raspberry Pi's is here. + There are magazines dedicated to them and tons of + books (not to mention + floods of online material, + such as the RPi projects book). + Google just released a + framework + for web-programming on Raspberry Pi's turning them into webservers. + In my home one Raspberry Pi has the very important task of automatically filtering out + nearly all advertisments using the + Pi-Hole software + (you cannot imagine what difference this does to your web experience...you just sit back and read what + is important). +

    + +

    + Arduinos are slightly older (from 2005) but still very cool (see picture on the right below). They + are small single-board micro-controllers that can talk to various external gadgets (sensors, motors, etc). Since Arduinos + are open-software and open-hardware there are many clones and add-on boards. Like for the Raspberry Pi, there + is a lot of material available about Arduinos. + The main reference is here. Like the Raspberry Pi's, the good thing about + Arduinos is that they can be powered with simple AA-batteries. +

    + +

    + I have several Raspberry Pi's including wifi-connectors and two cameras. + I also have two Freakduino Boards that are Arduinos extended with wireless communication. I can lend them to responsible + students for one or two projects. However, the aim is to first come up with an idea for a project. Popular projects are + automated temperature sensors, network servers, robots, web-cams (here + is a web-cam directed at the Shard that can + tell + you whether it is raining or cloudy). There are plenty more ideas listed + here for Raspberry Pi's and + here for Arduinos. +

    + +

    + There are essentially two kinds of projects: One is purely software-based. Software projects for Raspberry Pi's are often + written in Python, but since these are Linux-capable computers any other + language would do as well. You can also write your own operating system as done + here. For example the students + here developed their own bare-metal OS and then implemented + a chess-program on top of it (have a look at their very impressive + youtube video). + The other kind of project is a combination of hardware and software; usually attaching some sensors + or motors to the Raspberry Pi or Arduino. This might require some soldering or what is called + a bread-board. But be careful before choosing a project + involving new hardware: these devices + can be destroyed (if “Vin connected to GND” or “drawing more than 30mA from a GPIO” + does not make sense to you, you should probably stay away from such a project). +

    + +
    + Raspberry Pi + + Raspberry Pi Zero + + Arduino +
    + + +

    + Skills: + Well, you must be a hacker; happy to make things. Your desk might look like the photo below on the left. + The photo below on the middle shows an earlier student project which connects wirelessly a wearable Arduino (packaged + in a "self-3d-printed" watch) to a Raspberry Pi seen in the background. The Arduino in the foreground takes + measurements of + heart rate and body temperature; the Raspberry Pi collects this data and makes it accessible via a simple + web-service. The picture on the right is another project that implements an airmouse using an Arduino. + +

    + Raspberry Pi + + Raspberry Pi + + Raspberry Pi +

    + + + A really cool project using a toy helicopter and two Raspberry Pi's was done by Nikolaos Kyknas. He transformed + an off-the-shelf toy helicopter into an autonomous flying machine. He attached a Raspberry Pi Zero and an ultrasound + sensor to the helicopter for measuring the distance from ground. Another Raspberry Pi is attached to the “ground control + unit” in order to give instructions to the throttle of the helicopter. Both Raspberry Pi's communicate over WiFi for calculating + the next flight instruction. The goal is to find and maintain a steady altitude. Sounds simple? Well, not so fast: Rest assured there are + many thorny issues! First you need to get the balance of the helicopter plus Raspberry Pi plus its power source just right, + otherwise the helicopter will simply take off in random directions. Also the flight instructions need to be just right, + otherwise the helicopter would at best ``oscillate'' around the set altitude, but never be steady. To solve this problem, + Nikolaos used exactly the same algorithm that keeps cars at a steady pace when in cruise control. + +

    + + +
    + +
  • [CU5] An Infrastructure for Displaying and Animating Code in a Web-Browser

    + +

    + Description: + The project aim is to implement an infrastructure for displaying and + animating code in a web-browser. The infrastructure should be agnostic + with respect to the programming language, but should be configurable. + I envisage something smaller than the projects + here (for Python), + here (for Java), + here (for multiple languages), + here (for HTML) + here (for JavaScript), + and here (for Scala). +

    + +

    + The tasks in this project are being able (1) to lex and parse languages and (2) to write an interpreter. + The goal is to implement this as much as possible in a language-agnostic fashion. +

    + +

    + Skills: + Good skills in lexing and language parsing, as well as being fluent with web programming (for + example JavaScript). +

    + + +
  • [CU6] Proving the Correctness of Programs

    + +

    + I am one of the main developers of the interactive theorem prover + Isabelle. This theorem prover + has been used to establish the correctness of some quite large + programs (for example an operating system). + Together with colleagues from Nanjing, I used this theorem prover to establish the correctness of a + scheduling algorithm, called + Priority Inheritance, + for real-time operating systems. This scheduling algorithm is part of the operating + system that drives, for example, the + Mars rovers. + Actually, the very first Mars rover mission in 1997 did not have this + algorithm switched on and it almost caused a catastrophic mission failure (see + this youtube video here + for an explanation what happened). + We were able to prove the correctness of this algorithm, but were also able to + establish the correctness of some optimisations in this + paper. +

    + +

    On a much smaller scale, there are a few small programs and underlying algorithms where it + is not really understood whether they always compute a correct result (for example the + regular expression matcher by Sulzmann and Lu in project [CU1]). The aim of this + project is to completely specify an algorithm in Isabelle and then prove it correct (that is, + it always computes the correct result). +

    + +

    + Skills: + This project is for a very good student with a knack for theoretical things and formal reasoning. +

    + +
  • [CU7] Anything Security Related that is Interesting

    + +

    +If you have your own project that is related to security (must be +something interesting), please propose it. We can then have a look +whether it would be suitable for a project. +

    + +
  • [CU8] Anything Interesting in the Areas

    + +
      +
    • Elm (a reactive functional language for animating webpages; have a look at the cool examples, or here for an introduction) +
    • SMLtoJS (a ML compiler to JavaScript; or anything else related to + sane languages that compile to JavaScript) +
    • Any statistical data related to Bitcoins (in the spirit of this +paper or + this one; this will probably require some extensive C knowledge or any + other heavy-duty programming language) +
    • Anything related to programming languages and formal methods (like + static program analysis) +
    • Anything related to low-cost, hands-on hardware like Raspberry Pi, Arduino, + Cubieboard +
    • Anything related to unikernel operating systems, like + Xen or + Mirage OS +
    • Any kind of applied hacking, for example the Arduino-based keylogger described + here +
    • Anything related to code books, like this + one +
    + + + +
  • Earlier Projects

    + + I am also open to project suggestions from you. You might find some inspiration from my earlier projects: + BSc 2012/13, + MSc 2012/13, + BSc 2013/14, + MSc 2013/14, + BSc 2014/15, + MSc 2014/15, + BSc 2015/16, + MSc 2015/16, + BSc 2016/17, + MSc 2016/17, + BSc 2017/18, + MSc 2017/18 +
+
+ +

+Time-stamp: <- 2017-09-27 12:44:13 by Christian Urban> +[Validate this page.] +

+ + + + + diff -r 10e59469ecf5 -r 2abaef1458e9 h1.mp4 Binary file h1.mp4 has changed diff -r 10e59469ecf5 -r 2abaef1458e9 h3.mp4 Binary file h3.mp4 has changed