+| +
+ +2018/19 BSc Projects+Supervisor: Christian Urban+Email: christian dot urban at kcl dot ac dot uk,  Office: Bush House N7.07+If you are interested in a project, please send me an email and we can discuss details. Please include
+a short description about your programming skills and Computer Science background in your first email. 
+Thanks.+
+Note that besides being a lecturer at the theoretical end of Computer Science, I am also a passionate
+    hacker …
+    defined as “a person who enjoys exploring the details of programmable systems and 
+    stretching their capabilities, as opposed to most users, who prefer to learn only the minimum 
+    necessary.” I am always happy to supervise like-minded students.
++
+In 2013/14, I was nominated by the students
+    for the best BSc project supervisor and best MSc project supervisor awards in the NMS
+    faculty. Somehow I won both. In 2014/15 I was nominated again for the best MSc
+    project supervisor, but did not win it. ;o)
++
+
++ [CU1] Regular Expressions, Lexing and Derivatives+
+
+  Description:  
+  Regular expressions 
+  are extremely useful for many text-processing tasks, such as finding patterns in hostile network traffic,
+  lexing programs, syntax highlighting and so on. Given that regular expressions were
+  introduced in 1950 by Stephen Kleene,
+  you might think regular expressions have since been studied and implemented to death. But you would definitely be
+  mistaken: in fact they are still an active research area. On the top of my head, I can give
+  you at least ten research papers that appeared in the last few years.
+  For example
+  this paper 
+  about regular expression matching and derivatives was presented in 2014 at the international 
+  FLOPS conference. Another paper by my PhD student and me was presented in 2016
+  at the international ITP conference.
+  The task in this project is to implement these results and use them for lexing.+
+ The background for this project is that some regular expressions are 
+  “evil”
+  and can “stab you in the back” according to
+  this blog post.
+  For example, if you use in Python or 
+  in Ruby (or also in a number of other mainstream programming languages) the 
+  innocently looking regular expression +
+a?{28}a{28}and match it, say, against the string 
+aaaaaaaaaaaaaaaaaaaaaaaaaaaa(that is 28as), you will soon notice that your CPU usage goes to 100%. In fact,
+  Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself:
+  catastrophic.py (Python version) and 
+  catastrophic.rb 
+  (Ruby version). Here is a similar problem in Java: catastrophic.java 
+ 
+  You can imagine an attacker
+  mounting a nice DoS attack against 
+  your program if it contains such an “evil” regular expression. But it can also happen by accident:
+  on 20 July 2016 the website Stack Exchange
+  was knocked offline because of an evil regular expression. One of their engineers talks about this in this
+  video. A similar problem needed to be fixed in the
+  Atom editor.
+  A few implementations of regular expression matchers are almost immune from such problems.
+  For example, Scala can deal with strings of up to 4,300 +
+as in less than a second. But if you scale
+  the regular expression and string further to, say, 4,600as, then you get aStackOverflowError+  potentially crashing your program. Moreover (beside the "minor" problem of being painfully slow) according to this
+  report
+  nearly all regular expression matchers using the POSIX rules are actually buggy.
+ 
+  On a rainy afternoon, I implemented 
+  this 
+  regular expression matcher in Scala. It is not as fast as the official one in Scala, but
+  it can match up to 11,000 +
+as in less than 5 seconds  without raising any exception
+  (remember Python and Ruby both need nearly 30 seconds to process 28(!)as, and Scala's
+  official matcher maxes out at 4,600as). My matcher is approximately
+  85 lines of code and based on the concept of 
+  derivatives of regular expressions.
+  These derivatives were introduced in 1964 by 
+  Janusz Brzozowski, but according to this
+  paper had been lost in the “sands of time”.
+  The advantage of derivatives is that they side-step completely the usual 
+  translations of regular expressions
+  into NFAs or DFAs, which can introduce the exponential behaviour exhibited by the regular
+  expression matchers in Python, Java and Ruby.
+ 
+  Now the authors from the 
+  FLOPS'14-paper mentioned 
+  above claim they are even faster than me and can deal with even more features of regular expressions
+  (for example subexpression matching, which my rainy-afternoon matcher cannot). I am sure they thought
+  about the problem much longer than a single afternoon. The task 
+  in this project is to find out how good they actually are by implementing the results from their paper. 
+  Their approach to regular expression matching is also based on the concept of derivatives.
+  I used derivatives very successfully once for something completely different in a
+  paper 
+  about the Myhill-Nerode theorem.
+  So I know they are worth their money. Still, it would be interesting to actually compare their results
+  with my simple rainy-afternoon matcher and potentially “blow away” the regular expression matchers 
+  in Python, Ruby and Java (and possibly in Scala too). The application would be to implement a fast lexer for
+  programming languages, or improve the network traffic analysers in the tools Snort and
+  Bro.
+  +
+ 
+  Literature: 
+  The place to start with this project is obviously this
+  paper
+  and this one.
+  Traditional methods for regular expression matching are explained
+  in the Wikipedia articles 
+  here and 
+  here.
+  The authoritative book
+  on automata and regular expressions is by John Hopcroft and Jeffrey Ullmann (available in the library). 
+  There is also an online course about this topic by Ullman at 
+  Coursera, though IMHO not 
+  done with love. 
+  There are millions of other pointers about regular expression
+  matching on the Web. I found the chapter on Lexing in this
+  online book very helpful. Finally, it will
+  be of great help for this project to take part in my Compiler and Formal Language module (6CCS3CFL).
+  Test cases for “evil”
+  regular expressions can be obtained from here.
+  +
+ 
+  Skills: 
+  This is a project for a student with an interest in theory and with
+  good programming skills. The project can be easily implemented
+  in functional languages like
+  Scala,
+  F#, 
+  ML,  
+  Haskell, etc. Python and other non-functional languages
+  can be also used, but seem much less convenient. If you do attend my Compilers and Formal Languages
+  module, that would obviously give you a head-start with this project.
+  +  
+ [CU2] A Compiler for a small Programming Language+
+
+  Description: 
+  Compilers translate high-level programs that humans can read and write into
+  efficient machine code that can be run on a CPU or virtual machine.
+  A compiler for a simple functional language generating X86 code is described
+  here.
+  I recently implemented a very simple compiler for an even simpler functional
+  programming language following this 
+  paper 
+  (also described here).
+  My code, written in Scala, of this compiler is 
+  here.
+  The compiler can deal with simple programs involving natural numbers, such
+  as Fibonacci numbers or factorial (but it can be easily extended - that is not the point).
+  +
+ 
+  While the hard work has been done (understanding the two papers above),
+  my compiler only produces some idealised machine code. For example I
+  assume there are infinitely many registers. The goal of this
+  project is to generate machine code that is more realistic and can
+  run on a CPU, like X86, or run on a virtual machine, say the JVM. 
+  This gives probably a speedup of thousand times in comparison to
+  my naive machine code and virtual machine. The project
+  requires to dig into the literature about real CPUs and generating 
+  real machine code. 
+  + 
+  An alternative is to not generate machine code, but build a compiler that compiles to
+  JavaScript. This is the language that is supported by most
+  browsers and therefore is a favourite
+  vehicle for Web-programming. Some call it the scripting language of the Web.
+  Unfortunately, JavaScript is also probably one of the worst
+  languages to program in (being designed and released in a hurry). But it can be used as a convenient target
+  for translating programs from other languages. In particular there are two
+  very optimised subsets of JavaScript that can be used for this purpose:
+  one is asm.js and the other is
+  emscripten. Since
+  last year there is even the official Webassembly
+  There is a tutorial for emscripten
+  and an impressive demo which runs the
+  Unreal Engine 3
+  in a browser with spectacular speed. This was achieved by compiling the
+  C-code of the Unreal Engine to the LLVM intermediate language and then translating the LLVM
+  code to JavaScript.
+  +
+ 
+  Literature:
+  There is a lot of literature about compilers 
+  (for example this book -
+  I can lend you my copy for the duration of the project, or this
+  online book). A very good overview article
+  about implementing compilers by 
+  Laurie Tratt is 
+  here.
+  An online book about the Art of Assembly Language is
+  here.
+  An introduction into x86 machine code is here.
+  Intel's official manual for the x86 instruction is 
+  here. 
+  Two assemblers for the JVM are described here
+  and here.
+  An interesting twist of this project is to not generate code for a CPU, but
+  for the intermediate language of the LLVM compiler
+  (also described here). If you want to see
+  what machine code looks like you can compile your C-program using gcc -S.
+  + 
+  If JavaScript is chosen as a target instead, then there are plenty of tutorials on the Web.
+  Here is a list of free books on JavaScript.
+  A project from which you can draw inspiration is this
+  Lisp-to-JavaScript
+  translator. Here is another such project.
+  And another in less than 100 lines of code.
+  Coffeescript is a similar project
+  except that it is already quite mature. And finally not to
+  forget TypeScript developed by Microsoft. The main
+  difference between these projects and this one is that they translate into relatively high-level
+  JavaScript code; none of them use the much lower levels asm.js and 
+  emscripten.
+  + 
+  Skills: 
+  This is a project for a student with a deep interest in programming languages and
+  compilers. Since my compiler is implemented in Scala,
+  it would make sense to continue this project in this language. I can be
+  of help with questions and books about Scala.
+  But if Scala is a problem, my code can also be translated quickly into any other functional
+  language. Again,  it will be of great help for this project to take part in
+  my Compiler and Formal Language module (6CCS3CFL).
+  +
+ 
+  PS: Compiler projects consistently received high marks in the past.
+  I have supervised eight so far and most of them received a mark above 70% - one even was awarded a prize.
+  +
+ [CU3] Slide-Making in the Web-Age+
+
+  The standard technology for writing scientific papers in Computer Science  is to use
+  LaTeX, a document preparation
+  system originally implemented by Donald Knuth
+  and Leslie Lamport.
+  LaTeX produces very pleasantly looking documents, can deal nicely with mathematical
+  formulas and is very flexible. If you are interested, here
+  is a side-by-side comparison between Word and LaTeX (which LaTeX “wins” with 18 out of 21 points).
+  Computer scientists not only use LaTeX for documents,
+  but also for slides (really, nobody who wants to be cool uses Keynote or Powerpoint).
+  +
+ 
+  Although used widely, LaTeX seems nowadays a bit dated for producing
+  slides. Unlike documents, which are typically “static” and published in a book or journal,
+  slides often contain changing contents that might first only be partially visible and
+  only later be revealed as the “story” of a talk or lecture demands.
+  Also slides often contain animated algorithms where each state in the
+  calculation is best explained by highlighting the changing data.
+  +
+ 
+  It seems HTML and JavaScript are much better suited for generating
+  such animated slides. This page
+  links to slide-generating programs using this combination of technologies. 
+  However, the problem with all of these project is that they depend heavily on the users being
+  able to write JavaScript, CCS or HTML...not something one would like to depend on given that
+  “normal” users likely only have a LaTeX background. The aim of this project is to invent a
+  very simple language that is inspired by LaTeX and then generate from code written in this language
+  slides that can be displayed in a web-browser. An example would be the
+  Madoko project.
+  +
+ 
+ This sounds complicated, but there is already some help available:
+ Mathjax is a JavaScript library that can
+ be used to display mathematical text, for example+
+ 
+ +
+When \(a \ne 0\), there are two solutions to \(ax^2 + bx + c = 0\) and they are
+ \(x = {-b \pm \sqrt{b^2-4ac} \over 2a}\).+  
+ by writing code in the familiar LaTeX-way. This can be reused.
+ Another such library is KaTeX.
+ There are also plenty of JavaScript
+ libraries for graphical animations (for example
+ Raphael,
+ SVG.JS,
+ Bonsaijs,
+ JSXGraph). The inspiration for how the user should be able to write
+ slides could come from the LaTeX packages Beamer
+ and PGF/TikZ. A slide-making project from which
+ inspiration can be drawn is hyhyhy.
+ +
+ 
+  Skills: 
+  This is a project that requires good knowledge of JavaScript. You need to be able to
+  parse a language and translate it to a suitable part of JavaScript using
+  appropriate libraries. Tutorials for JavaScript are here.
+  A parser generator for JavaScript is here. There are probably also
+  others. If you want to avoid JavaScript there are a number of alternatives: for example the
+  Elm
+  language has been especially designed for implementing interactive animations, which would be
+  very convenient for this project. A nice slide making project done by a previous student is 
+  MarkSlides by Oleksandr Cherednychenko. 
+  +
+ [CU4] Raspberry Pi's and Arduinos+
+
+  Description:
+  This project is for true hackers! Raspberry Pi's
+  are small Linux computers the size of a credit-card and only cost £26, the
+  simplest version even costs only £5 (see pictures on the left below). They were introduced
+  in 2012 and people went crazy...well some of them. There is a
+  Google+
+  community about Raspberry Pi's that has more
+  than 197k of followers. It is hard to keep up with what people do with these small computers. The possibilities
+  seem to be limitless. The main resource for Raspberry Pi's is here.
+  There are magazines dedicated to them and tons of
+  books (not to mention
+  floods of online material,
+  such as the RPi projects book).
+  Google just released a
+  framework
+  for web-programming on Raspberry Pi's turning them into webservers.
+  In my home one Raspberry Pi has the very important task of automatically filtering out
+  nearly all advertisments using the 
+  Pi-Hole software
+  (you cannot imagine what difference this does to your web experience...you just sit back and read what
+  is important).
+  +
+ 
+  Arduinos are slightly older (from 2005) but still very cool (see picture on the right below). They
+  are small single-board micro-controllers that can talk to various external gadgets (sensors, motors, etc). Since Arduinos
+  are open-software and open-hardware there are many clones and add-on boards. Like for the Raspberry Pi, there
+  is a lot of material available about Arduinos.
+  The main reference is here. Like the Raspberry Pi's, the good thing about
+  Arduinos is that they can be powered with simple AA-batteries.
+  +
+ 
+  I have several Raspberry Pi's including wifi-connectors and two cameras.
+  I also have two Freakduino Boards that are Arduinos extended with wireless communication. I can lend them to responsible
+  students for one or two projects. However, the aim is to first come up with an idea for a project. Popular projects are
+  automated temperature sensors, network servers, robots, web-cams (here
+  is a web-cam directed at the Shard that can
+  tell
+  you whether it is raining or cloudy). There are plenty more ideas listed
+  here for Raspberry Pi's and
+  here for Arduinos.
+  +
+ 
+  There are essentially two kinds of projects: One is purely software-based. Software projects for Raspberry Pi's are often
+  written in Python, but since these are Linux-capable computers any other
+  language would do as well. You can also write your own operating system as done
+  here. For example the students
+  here developed their own bare-metal OS and then implemented
+  a chess-program on top of it (have a look at their very impressive
+  youtube video).
+  The other kind of project is a combination of hardware and software; usually attaching some sensors
+  or motors to the Raspberry Pi or Arduino. This might require some soldering or what is called
+  a bread-board. But be careful before choosing a project
+  involving new hardware: these devices
+  can be destroyed (if “Vin connected to GND” or “drawing more than 30mA from a GPIO”
+  does not make sense to you, you should probably stay away from such a project). 
+  +
+  
+  +
+  +
+  +  
+  
+
+ 
+  Skills: 
+  Well, you must be a hacker; happy to make things. Your desk might look like the photo below on the left.
+  The photo below on the middle shows an earlier student project which connects wirelessly a wearable Arduino (packaged
+  in a "self-3d-printed" watch) to a Raspberry Pi seen in the background. The Arduino in the foreground takes
+  measurements of 
+  heart rate and body temperature; the Raspberry Pi collects this data and makes it accessible via a simple
+  web-service. The picture on the right is another project that implements an airmouse using an Arduino.
+
+  +  +
+  +
+  + 
+
+
+    A really cool project using a toy helicopter and two Raspberry Pi's was done by Nikolaos Kyknas. He transformed
+    an off-the-shelf toy helicopter into an autonomous flying machine. He attached a Raspberry Pi Zero and an ultrasound
+    sensor to the helicopter for measuring the distance from ground. Another Raspberry Pi is attached to the “ground control
+    unit” in order to give instructions to the throttle of the helicopter. Both Raspberry Pi's communicate over WiFi for calculating
+    the next flight instruction. The goal is to find and maintain a steady altitude. Sounds simple? Well, not so fast: Rest assured there are
+    many thorny issues! First you need to get the balance of the helicopter plus Raspberry Pi plus its power source just right,
+    otherwise the helicopter will simply take off in random directions. Also the flight instructions need to be just right,
+    otherwise the helicopter would at best ``oscillate'' around the set altitude, but never be steady. To solve this problem, 
+    Nikolaos used exactly the same algorithm that keeps cars at a steady pace when in cruise control. 
+
+    +      
+      
+      
+
+ [CU5] An Infrastructure for Displaying and Animating Code in a Web-Browser+  
+
+  Description:
+  The project aim is to implement an infrastructure for displaying and
+  animating code in a web-browser. The infrastructure should be agnostic
+  with respect to the programming language, but should be configurable.
+  I envisage something smaller than the projects 
+  here (for Python),
+  here (for Java),
+  here (for multiple languages),
+  here (for HTML)
+  here (for JavaScript),
+  and here (for Scala).
+  +
+ 
+  The tasks in this project are being able (1) to lex and parse languages and (2) to write an interpreter.
+  The goal is to implement this as much as possible in a language-agnostic fashion.
+  +
+ 
+  Skills: 
+  Good skills in lexing and language parsing, as well as being fluent with web programming (for
+  example JavaScript).
+  +
+
+ [CU6] Proving the Correctness of Programs+
+
+ I am one of the main developers of the interactive theorem prover
+ Isabelle. This theorem prover
+ has been used to establish the correctness of some quite large
+ programs (for example an operating system).
+ Together with colleagues from Nanjing, I used this theorem prover to establish the correctness of a
+ scheduling algorithm, called
+ Priority Inheritance,
+ for real-time operating systems. This scheduling algorithm is part of the operating
+ system that drives, for example, the 
+ Mars rovers.
+ Actually, the very first Mars rover mission in 1997 did not have this
+ algorithm switched on and it almost caused a catastrophic mission failure (see
+ this youtube video here
+ for an explanation what happened).
+ We were able to prove the correctness of this algorithm, but were also able to
+ establish the correctness of some optimisations in this
+ paper.
+ +
+ On a much smaller scale, there are a few small programs and underlying algorithms where it
+ is not really understood whether they always compute a correct result (for example the
+ regular expression matcher by Sulzmann and Lu in project [CU1]). The aim of this
+ project is to completely specify an algorithm in Isabelle and then prove it correct (that is,
+ it always computes the correct result).
++
+ 
+  Skills: 
+  This project is for a very good student with a knack for theoretical things and formal reasoning.
+  +
+ [CU7] Anything Security Related that is Interesting+  
+
+If you have your own project that is related to security (must be
+something interesting), please propose it. We can then have a look
+whether it would be suitable for a project.
++
+ [CU8] Anything Interesting in the Areas+  
+
++
+
+
+Elm (a reactive functional language for animating webpages; have a look at the cool examples, or here for an introduction)
+SMLtoJS (a ML compiler to JavaScript; or anything else related to
+  sane languages that compile to JavaScript)
+Any statistical data related to Bitcoins (in the spirit of this
+paper or
+  this one; this will probably require some extensive C knowledge or any
+  other heavy-duty programming language)
+Anything related to programming languages and formal methods (like
+  static program analysis)  
+Anything related to low-cost, hands-on hardware like Raspberry Pi, Arduino,
+  Cubieboard
+Anything related to unikernel operating systems, like
+  Xen or
+  Mirage OS
+Any kind of applied hacking, for example the Arduino-based keylogger described
+   here
+Anything related to code books, like this
+   one
+ Earlier Projects+
+ I am also open to project suggestions from you. You might find some inspiration from my earlier projects:
+ BSc 2012/13, 
+ MSc 2012/13, 
+ BSc 2013/14,
+ MSc 2013/14, 
+ BSc 2014/15,
+ MSc 2014/15, 
+ BSc 2015/16,
+ MSc 2015/16, 
+ BSc 2016/17,
+ MSc 2016/17,
+ BSc 2017/18,
+ MSc 2017/18
+ | 
  
+