218 \texttt{a?a?a?...a?a?aaa...aa}, where the \texttt{a?} and |
218 \texttt{a?a?a?...a?a?aaa...aa}, where the \texttt{a?} and |
219 \texttt{a} each occur $n$ times. More such test cases can be |
219 \texttt{a} each occur $n$ times. More such test cases can be |
220 found at \url{http://www.computerbytesman.com/redos/}.} |
220 found at \url{http://www.computerbytesman.com/redos/}.} |
221 Admittedly, this regular expression is carefully chosen to |
221 Admittedly, this regular expression is carefully chosen to |
222 exhibit this exponential behaviour, but similar ones occur |
222 exhibit this exponential behaviour, but similar ones occur |
223 more often than one wants in ``real life''. They are sometimes |
223 more often than one wants in ``real life''. For example, on 20 |
224 called \emph{evil regular expressions} because they have the |
224 July 2016 a similar regular expression brought the webpage |
|
225 \href{http://stackexchange.com}{Stack Exchange} to its knees: |
|
226 |
|
227 \begin{center} |
|
228 \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016} |
|
229 \end{center} |
|
230 |
|
231 \noindent |
|
232 Such troublesome regular expressions are sometimes called |
|
233 \emph{evil regular expressions} because they have the |
225 potential to make regular expression matching engines to |
234 potential to make regular expression matching engines to |
226 topple over, like in Python and Ruby. The problem with evil |
235 topple over, like in Python and Ruby. The problem with evil |
227 regular expressions is that they can have some serious |
236 regular expressions is that they can have some serious |
228 consequences, for example, if you use them in your |
237 consequences, for example, if you use them in your |
229 web-application. The reason is that hackers can look for these |
238 web-application. The reason is that hackers can look for these |
230 instances where the matching engine behaves badly and then |
239 instances where the matching engine behaves badly and then |
231 mount a nice DoS-attack against your application. These |
240 mount a nice DoS-attack against your application. These |
232 attacks are already have their own name: |
241 attacks are already have their own name: \emph{Regular |
233 \emph{Regular Expression Denial of Service Attacks (ReDoS)}. |
242 Expression Denial of Service Attacks (ReDoS)}. |
234 |
243 |
235 It will be instructive to look behind the ``scenes'' to find |
244 It will be instructive to look behind the ``scenes'' to find |
236 out why Python and Ruby (and others) behave so badly when |
245 out why Python and Ruby (and others) behave so badly when |
237 matching with evil regular expressions. But we will also look |
246 matching with evil regular expressions. But we will also look |
238 at a relatively simple algorithm that solves this problem much |
247 at a relatively simple algorithm that solves this problem much |