handouts/ho01.tex
changeset 407 4b454a6d1814
parent 404 245d302791c7
child 415 4ae59fd3b174
equal deleted inserted replaced
406:0a42d73e795b 407:4b454a6d1814
   218 \texttt{a?a?a?...a?a?aaa...aa}, where the \texttt{a?} and
   218 \texttt{a?a?a?...a?a?aaa...aa}, where the \texttt{a?} and
   219 \texttt{a} each occur $n$ times. More such test cases can be
   219 \texttt{a} each occur $n$ times. More such test cases can be
   220 found at \url{http://www.computerbytesman.com/redos/}.}
   220 found at \url{http://www.computerbytesman.com/redos/}.}
   221 Admittedly, this regular expression is carefully chosen to
   221 Admittedly, this regular expression is carefully chosen to
   222 exhibit this exponential behaviour, but similar ones occur
   222 exhibit this exponential behaviour, but similar ones occur
   223 more often than one wants in ``real life''. They are sometimes
   223 more often than one wants in ``real life''. For example, on 20
   224 called \emph{evil regular expressions} because they have the
   224 July 2016 a similar regular expression brought the webpage
       
   225 \href{http://stackexchange.com}{Stack Exchange} to its knees:
       
   226 
       
   227 \begin{center}
       
   228 \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}
       
   229 \end{center}
       
   230 
       
   231 \noindent
       
   232 Such troublesome regular expressions are sometimes called
       
   233 \emph{evil regular expressions} because they have the
   225 potential to make regular expression matching engines to
   234 potential to make regular expression matching engines to
   226 topple over, like in Python and Ruby. The problem with evil
   235 topple over, like in Python and Ruby. The problem with evil
   227 regular expressions is that they can have some serious
   236 regular expressions is that they can have some serious
   228 consequences, for example, if you use them in your
   237 consequences, for example, if you use them in your
   229 web-application. The reason is that hackers can look for these
   238 web-application. The reason is that hackers can look for these
   230 instances where the matching engine behaves badly and then
   239 instances where the matching engine behaves badly and then
   231 mount a nice DoS-attack against your application. These
   240 mount a nice DoS-attack against your application. These
   232 attacks are already have their own name: 
   241 attacks are already have their own name: \emph{Regular
   233 \emph{Regular Expression Denial of Service Attacks (ReDoS)}.
   242 Expression Denial of Service Attacks (ReDoS)}.
   234 
   243 
   235 It will be instructive to look behind the ``scenes'' to find
   244 It will be instructive to look behind the ``scenes'' to find
   236 out why Python and Ruby (and others) behave so badly when
   245 out why Python and Ruby (and others) behave so badly when
   237 matching with evil regular expressions. But we will also look
   246 matching with evil regular expressions. But we will also look
   238 at a relatively simple algorithm that solves this problem much
   247 at a relatively simple algorithm that solves this problem much