# HG changeset patch # User Christian Urban # Date 1538139258 -3600 # Node ID bddf14e026b3b813d503b69b59d361f0cb49066c # Parent fa6d00968b0e95d5dd9b44726f5197c259283777 updated diff -r fa6d00968b0e -r bddf14e026b3 handouts/ho01.pdf Binary file handouts/ho01.pdf has changed diff -r fa6d00968b0e -r bddf14e026b3 handouts/ho01.tex --- a/handouts/ho01.tex Tue Sep 25 21:57:14 2018 +0100 +++ b/handouts/ho01.tex Fri Sep 28 13:54:18 2018 +0100 @@ -248,7 +248,7 @@ \end{tabular} \end{center} -\noindent This first graph shows that Python and Java need +\noindent This first graph shows that Python and Java 8 need approximately 30 seconds to find out that the regular expression $\texttt{(a*)*\,b}$ does not match strings of 28 \texttt{a}s. Similarly, the second shows that Python needs approximately 29 seconds @@ -282,6 +282,7 @@ \url{http://davidvgalbraith.com/how-i-fixed-atom/} \end{center} +\noindent and when somebody tried to match web-addresses using regular expressions @@ -289,11 +290,11 @@ \url{https://www.tutorialdocs.com/article/regex-trap.html} \end{center} -\noindent + Such troublesome regular expressions are sometimes called \emph{evil regular expressions} because they have the potential to make regular expression matching engines to topple over, like in Python, Ruby and -Java. This ``toppling over'' is also sometimes called +Java 8. This ``toppling over'' is also sometimes called \emph{catastrophic backtracking}. I have also seen the term \emph{eternal matching} used for this. The problem with evil regular expressions is that they can have some serious consequences, for @@ -339,7 +340,7 @@ \end{center} \noindent And in the case of the regular expression $\texttt{(a*)*\,b}$ -and strings of \texttt{a}s we will beat Java by factor of +and strings of \texttt{a}s we will beat Java 8 by factor of approximately 1,000,000 (note the scale on the $x$-axis). \begin{center} @@ -571,7 +572,7 @@ \noindent That means two regular expressions are said to be equivalent if they match the same set of strings. That is -there meaning is the same. Therefore we +their meanings is the same. Therefore we do not really distinguish between the different regular expression $(r_1 + r_2) + r_3$ and $r_1 + (r_2 + r_3)$, because they are equivalent. I leave you to the question diff -r fa6d00968b0e -r bddf14e026b3 progs/catastrophic.rb --- a/progs/catastrophic.rb Tue Sep 25 21:57:14 2018 +0100 +++ b/progs/catastrophic.rb Fri Sep 28 13:54:18 2018 +0100 @@ -1,14 +1,14 @@ # A case of catastrophic backtracking in Ruby #--------------------------------------------- +# example provided by Daniel Baldwin # +# # regex: (a?){n} a{n} # strings: aa... # -# example provided by Daniel Baldwin +# run on the command line with: # -# call with: -# -# > ruby catastrophic.rb +# $> ruby catastrophic.rb # nums = (1..1000) diff -r fa6d00968b0e -r bddf14e026b3 progs/re1.scala --- a/progs/re1.scala Tue Sep 25 21:57:14 2018 +0100 +++ b/progs/re1.scala Fri Sep 28 13:54:18 2018 +0100 @@ -131,6 +131,7 @@ // derivatives might result into bigger and bigger // regular expressions...here is an example for this: +// (a+b)* o a o b o (a+b)* val BIG_aux = STAR(ALT(CHAR('a'), CHAR('b'))) val BIG = SEQ(BIG_aux, SEQ(CHAR('a'),SEQ(CHAR('b'), BIG_aux)))