msc-projects-15.html
changeset 384 27b7af6a00e5
parent 383 8787b77c9472
child 385 046a49edbeb8
equal deleted inserted replaced
383:8787b77c9472 384:27b7af6a00e5
    66   <p>The background for this project is that some regular expressions are 
    66   <p>The background for this project is that some regular expressions are 
    67   &ldquo;<A HREF="http://en.wikipedia.org/wiki/ReDoS#Examples">evil</A>&rdquo;
    67   &ldquo;<A HREF="http://en.wikipedia.org/wiki/ReDoS#Examples">evil</A>&rdquo;
    68   and can &ldquo;stab you in the back&rdquo; according to
    68   and can &ldquo;stab you in the back&rdquo; according to
    69   this <A HREF="http://peterscott.github.io/2013/01/17/regular-expressions-will-stab-you-in-the-back/">blog post</A>.
    69   this <A HREF="http://peterscott.github.io/2013/01/17/regular-expressions-will-stab-you-in-the-back/">blog post</A>.
    70   For example, if you use in <A HREF="http://www.python.org">Python</A> or 
    70   For example, if you use in <A HREF="http://www.python.org">Python</A> or 
    71   in <A HREF="http://www.ruby-lang.org/en/">Ruby</A> (or also in a number of other mainstream programming languages according to this
    71   in <A HREF="http://www.ruby-lang.org/en/">Ruby</A> (or also in a number of other mainstream programming languages) the 
    72   <A HREF="http://www.computerbytesman.com/redos/">blog</A>) the 
       
    73   innocently looking regular expression <code>a?{28}a{28}</code> and match it, say, against the string 
    72   innocently looking regular expression <code>a?{28}a{28}</code> and match it, say, against the string 
    74   <code>aaaaaaaaaaaaaaaaaaaaaaaaaaaa</code> (that is 28 <code>a</code>s), you will soon notice that your CPU usage goes to 100%. In fact,
    73   <code>aaaaaaaaaaaaaaaaaaaaaaaaaaaa</code> (that is 28 <code>a</code>s), you will soon notice that your CPU usage goes to 100%. In fact,
    75   Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself:
    74   Python and Ruby need approximately 30 seconds of hard work for matching this string. You can try it for yourself:
    76   <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/progs/re.py">re.py</A> (Python version) and 
    75   <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/progs/re.py">re.py</A> (Python version) and 
    77   <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/progs/re.rb">re.rb</A> 
    76   <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/progs/re.rb">re.rb</A> 
    81   <A HREF="http://www.scala-lang.org/">Scala</A> (and also Java) are almost immune from such
    80   <A HREF="http://www.scala-lang.org/">Scala</A> (and also Java) are almost immune from such
    82   attacks as they can deal with strings of up to 4,300 <code>a</code>s in less than a second. But if you scale
    81   attacks as they can deal with strings of up to 4,300 <code>a</code>s in less than a second. But if you scale
    83   the regular expression and string further to, say, 4,600 <code>a</code>s, then you get a <code>StackOverflowError</code> 
    82   the regular expression and string further to, say, 4,600 <code>a</code>s, then you get a <code>StackOverflowError</code> 
    84   potentially crashing your program. Moreover (beside the "minor" problem of being painfully slow) according to this
    83   potentially crashing your program. Moreover (beside the "minor" problem of being painfully slow) according to this
    85   <A HREF="http://www.haskell.org/haskellwiki/Regex_Posix">report</A>
    84   <A HREF="http://www.haskell.org/haskellwiki/Regex_Posix">report</A>
    86   nearly all POSIX regular expression matchers are actually buggy.
    85   nearly all regular expression matchers using the POSIX rules are actually buggy.
    87   </p>
    86   </p>
    88 
    87 
    89   <p>
    88   <p>
    90   On a rainy afternoon, I implemented 
    89   On a rainy afternoon, I implemented 
    91   <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/progs/re3.scala">this</A> 
    90   <A HREF="http://www.dcs.kcl.ac.uk/staff/urbanc/cgi-bin/repos.cgi/afl-material/raw-file/tip/progs/re3.scala">this</A> 
   122   </p>
   121   </p>
   123 
   122 
   124   <p>
   123   <p>
   125   <B>Literature:</B> 
   124   <B>Literature:</B> 
   126   The place to start with this project is obviously this
   125   The place to start with this project is obviously this
   127   <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/ppdp12-part-deriv-sub-match.pdf">paper</A>.
   126   <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/regex-parsing-derivatives.pdf">paper</A>
       
   127   and this <A HREF="http://www.home.hs-karlsruhe.de/~suma0002/publications/ppdp12-part-deriv-sub-match.pdf">one</A>.
   128   Traditional methods for regular expression matching are explained
   128   Traditional methods for regular expression matching are explained
   129   in the Wikipedia articles 
   129   in the Wikipedia articles 
   130   <A HREF="http://en.wikipedia.org/wiki/DFA_minimization">here</A> and 
   130   <A HREF="http://en.wikipedia.org/wiki/DFA_minimization">here</A> and 
   131   <A HREF="http://en.wikipedia.org/wiki/Powerset_construction">here</A>.
   131   <A HREF="http://en.wikipedia.org/wiki/Powerset_construction">here</A>.
   132   The authoritative <A HREF="http://infolab.stanford.edu/~ullman/ialc.html">book</A>
   132   The authoritative <A HREF="http://infolab.stanford.edu/~ullman/ialc.html">book</A>
   619 </TD>
   619 </TD>
   620 </TR>
   620 </TR>
   621 </TABLE>
   621 </TABLE>
   622 
   622 
   623 <P>
   623 <P>
   624 <!-- hhmts start --> Last modified: Sat Nov 21 11:45:43 GMT 2015 <!-- hhmts end -->
   624 <!-- hhmts start --> Last modified: Sat Nov 21 11:58:49 GMT 2015 <!-- hhmts end -->
   625 <a href="http://validator.w3.org/check/referer">[Validate this page.]</a>
   625 <a href="http://validator.w3.org/check/referer">[Validate this page.]</a>
   626 </BODY>
   626 </BODY>
   627 </HTML>
   627 </HTML>