diff -r 569280c1f56c -r 24d9d64c2a95 ninems/ninems.tex --- a/ninems/ninems.tex Sat Jul 13 22:56:31 2019 +0100 +++ b/ninems/ninems.tex Tue Jul 16 22:18:18 2019 +0100 @@ -198,10 +198,31 @@ massive amounts of CPU-resources---causing web servers to grind to a halt. This happened when a post with 20,000 white spaces was submitted, but importantly the white spaces were neither at the -beginning nor at the end. As a result, the regular expression matching -engine needed to backtrack over many choices. The underlying problem is +beginning nor at the end. +As a result, the regular expression matching +engine needed to backtrack over many choices. +In this example, the time needed to process the string is not +exactly the classical exponential case, but rather $O(n^2)$ +with respect to the string length. But this is enough for the +home page of stackexchnge to respond not fast enough to +the load balancer, which thought that there must be some +attack and therefore stopped the servers from responding to +requests. This makes the whole site become unavailable. +Another fresh example that just came out of the oven +is the cloudfare outage incident. A poorly written +regular expression exhibited exponential behaviour +and exhausted CPUs that serve HTTP traffic on 2 July 2019, + resulting in a 27-minute outage. The regular expression contained +adjacent Kleene stars, which is the source of the exponential number +of backtracking possibilities. Although this outage has a series of +causes that came from different vulnerabilities within the system +of cloudfare, the heart of the problem lies within regular expression. +Such outages could be excluded from happening if the matching +algorithm is guaranteed to be fast.\footnote{\url{https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/}} + The underlying problem is that many ``real life'' regular expression matching engines do not use -DFAs for matching. This is because they support regular expressions that +DFAs for matching. +This is because they support regular expressions that are not covered by the classical automata theory, and in this more general setting there are quite a few research questions still unanswered and fast algorithms still need to be developed (for example