ninems/ninems.tex
changeset 75 24d9d64c2a95
parent 72 83b021fc7d29
child 76 f575cf219377
equal deleted inserted replaced
73:569280c1f56c 75:24d9d64c2a95
   196 In this instance, a regular expression intended to just trim white
   196 In this instance, a regular expression intended to just trim white
   197 spaces from the beginning and the end of a line actually consumed
   197 spaces from the beginning and the end of a line actually consumed
   198 massive amounts of CPU-resources---causing web servers to
   198 massive amounts of CPU-resources---causing web servers to
   199 grind to a halt. This happened when a post with 20,000 white spaces was
   199 grind to a halt. This happened when a post with 20,000 white spaces was
   200 submitted, but importantly the white spaces were neither at the
   200 submitted, but importantly the white spaces were neither at the
   201 beginning nor at the end. As a result, the regular expression matching
   201 beginning nor at the end. 
   202 engine needed to backtrack over many choices. The underlying problem is
   202 As a result, the regular expression matching
       
   203 engine needed to backtrack over many choices.
       
   204 In this example, the time needed to process the string is not 
       
   205 exactly the classical exponential case, but rather $O(n^2)$
       
   206 with respect to the string length. But this is enough for the 
       
   207 home page of stackexchnge to respond not fast enough to
       
   208 the load balancer, which thought that there must be some
       
   209 attack and therefore stopped the servers from responding to 
       
   210 requests. This makes the whole site become unavailable.
       
   211 Another fresh example that just came out of the oven
       
   212 is the cloudfare outage incident. A poorly written
       
   213 regular expression exhibited exponential behaviour
       
   214 and exhausted CPUs that serve HTTP traffic on 2 July 2019,
       
   215  resulting in a 27-minute outage. The regular expression contained 
       
   216 adjacent Kleene stars, which is the source of the exponential number
       
   217 of backtracking possibilities. Although this outage has a series of
       
   218 causes that came from different vulnerabilities within the system 
       
   219 of cloudfare, the heart of the problem lies within regular expression.
       
   220 Such outages could be excluded from happening if the matching 
       
   221 algorithm is guaranteed to be fast.\footnote{\url{https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/}}
       
   222  The underlying problem is
   203 that many ``real life'' regular expression matching engines do not use
   223 that many ``real life'' regular expression matching engines do not use
   204 DFAs for matching. This is because they support regular expressions that
   224 DFAs for matching. 
       
   225 This is because they support regular expressions that
   205 are not covered by the classical automata theory, and in this more
   226 are not covered by the classical automata theory, and in this more
   206 general setting there are quite a few research questions still
   227 general setting there are quite a few research questions still
   207 unanswered and fast algorithms still need to be developed (for example
   228 unanswered and fast algorithms still need to be developed (for example
   208 how to treat bounded repetitions, negation and  back-references
   229 how to treat bounded repetitions, negation and  back-references
   209 efficiently).
   230 efficiently).