handouts/ho01.tex
changeset 563 bddf14e026b3
parent 562 fa6d00968b0e
child 570 617c3b0e0a81
equal deleted inserted replaced
562:fa6d00968b0e 563:bddf14e026b3
   246 \end{axis}
   246 \end{axis}
   247 \end{tikzpicture}
   247 \end{tikzpicture}
   248 \end{tabular}
   248 \end{tabular}
   249 \end{center}
   249 \end{center}
   250 
   250 
   251 \noindent This first graph shows that Python and Java need
   251 \noindent This first graph shows that Python and Java 8 need
   252 approximately 30 seconds to find out that the regular expression
   252 approximately 30 seconds to find out that the regular expression
   253 $\texttt{(a*)*\,b}$ does not match strings of 28 \texttt{a}s.
   253 $\texttt{(a*)*\,b}$ does not match strings of 28 \texttt{a}s.
   254 Similarly, the second shows that Python needs approximately 29 seconds
   254 Similarly, the second shows that Python needs approximately 29 seconds
   255 for finding out whether a string of 28 \texttt{a}s matches the regular
   255 for finding out whether a string of 28 \texttt{a}s matches the regular
   256 expression \texttt{a?\{28\}\,a\{28\}}.  Ruby is even slightly
   256 expression \texttt{a?\{28\}\,a\{28\}}.  Ruby is even slightly
   280 
   280 
   281 \begin{center}
   281 \begin{center}
   282 \url{http://davidvgalbraith.com/how-i-fixed-atom/}
   282 \url{http://davidvgalbraith.com/how-i-fixed-atom/}
   283 \end{center}
   283 \end{center}
   284 
   284 
       
   285 \noindent
   285 and when somebody tried to match web-addresses using regular
   286 and when somebody tried to match web-addresses using regular
   286 expressions
   287 expressions
   287 
   288 
   288 \begin{center}
   289 \begin{center}
   289 \url{https://www.tutorialdocs.com/article/regex-trap.html}
   290 \url{https://www.tutorialdocs.com/article/regex-trap.html}
   290 \end{center}  
   291 \end{center}  
   291 
   292 
   292 \noindent
   293 
   293 Such troublesome regular expressions are sometimes called \emph{evil
   294 Such troublesome regular expressions are sometimes called \emph{evil
   294   regular expressions} because they have the potential to make regular
   295   regular expressions} because they have the potential to make regular
   295 expression matching engines to topple over, like in Python, Ruby and
   296 expression matching engines to topple over, like in Python, Ruby and
   296 Java. This ``toppling over'' is also sometimes called
   297 Java 8. This ``toppling over'' is also sometimes called
   297 \emph{catastrophic backtracking}.  I have also seen the term
   298 \emph{catastrophic backtracking}.  I have also seen the term
   298 \emph{eternal matching} used for this.  The problem with evil regular
   299 \emph{eternal matching} used for this.  The problem with evil regular
   299 expressions is that they can have some serious consequences, for
   300 expressions is that they can have some serious consequences, for
   300 example, if you use them in your web-application. The reason is that
   301 example, if you use them in your web-application. The reason is that
   301 hackers can look for these instances where the matching engine behaves
   302 hackers can look for these instances where the matching engine behaves
   337 \end{axis}
   338 \end{axis}
   338 \end{tikzpicture}
   339 \end{tikzpicture}
   339 \end{center}
   340 \end{center}
   340 
   341 
   341 \noindent And in the case of the regular expression $\texttt{(a*)*\,b}$
   342 \noindent And in the case of the regular expression $\texttt{(a*)*\,b}$
   342 and strings of \texttt{a}s we will beat Java by factor of
   343 and strings of \texttt{a}s we will beat Java 8 by factor of
   343 approximately 1,000,000 (note the scale on the $x$-axis). 
   344 approximately 1,000,000 (note the scale on the $x$-axis). 
   344 
   345 
   345 \begin{center}
   346 \begin{center}
   346 \begin{tikzpicture}
   347 \begin{tikzpicture}
   347   \begin{axis}[
   348   \begin{axis}[
   569 r_1 \equiv r_2 \;\dn\; L(r_1) = L(r_2)
   570 r_1 \equiv r_2 \;\dn\; L(r_1) = L(r_2)
   570 \]
   571 \]
   571 
   572 
   572 \noindent That means two regular expressions are said to be
   573 \noindent That means two regular expressions are said to be
   573 equivalent if they match the same set of strings. That is
   574 equivalent if they match the same set of strings. That is
   574 there meaning is the same. Therefore we
   575 their meanings is the same. Therefore we
   575 do not really distinguish between the different regular
   576 do not really distinguish between the different regular
   576 expression $(r_1 + r_2) + r_3$ and $r_1 + (r_2 + r_3)$,
   577 expression $(r_1 + r_2) + r_3$ and $r_1 + (r_2 + r_3)$,
   577 because they are equivalent. I leave you to the question
   578 because they are equivalent. I leave you to the question
   578 whether
   579 whether
   579 
   580