handouts/ho01.tex
changeset 415 4ae59fd3b174
parent 407 4b454a6d1814
child 416 357c395ae838
--- a/handouts/ho01.tex	Mon Aug 22 23:05:43 2016 +0200
+++ b/handouts/ho01.tex	Tue Aug 23 12:26:13 2016 +0200
@@ -24,7 +24,7 @@
 %% emacs regexes
 %% https://www.gnu.org/software/emacs/manual/html_node/elisp/Regular-Expressions.html
 
-%% reasons for a new prgramming language
+%%  reasons for a new prgramming language
 %% http://beautifulracket.com
 
 \begin{document}
@@ -182,13 +182,17 @@
 are many libraries implementing regular expressions. I am sure
 you have come across them before (remember PRA?). Why on earth
 then is there any interest in studying them again in depth in
-this module? Well, one answer is in the following graph about
-regular expression matching in Python and in Ruby.
+this module? Well, one answer is in the following two graphs about
+regular expression matching in Python, Ruby and Java.
 
 \begin{center}
+\begin{tabular}{@{}cc@{}}
 \begin{tikzpicture}
 \begin{axis}[
-    xlabel={strings of {\tt a}s},
+    title={Graph: $\texttt{a?\{n\}\,a{\{n\}}}$ and strings 
+           $\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$},
+    xlabel={$n$},
+    x label style={at={(1.05,0.0)}},
     ylabel={time in secs},
     enlargelimits=false,
     xtick={0,5,...,30},
@@ -197,7 +201,7 @@
     ytick={0,5,...,30},
     scaled ticks=false,
     axis lines=left,
-    width=7cm,
+    width=5.5cm,
     height=4.5cm, 
     legend entries={Python,Ruby},  
     legend pos=north west,
@@ -208,17 +212,43 @@
   table {re-ruby.data};  
 \end{axis}
 \end{tikzpicture}
+&
+\begin{tikzpicture}
+\begin{axis}[
+    title={Graph: $\texttt{(a*)*\,b}$ and strings 
+           $\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$},
+    xlabel={$n$},
+    x label style={at={(1.05,0.0)}},
+    ylabel={time in secs},
+    enlargelimits=false,
+    xtick={0,5,...,30},
+    xmax=33,
+    ymax=35,
+    ytick={0,5,...,30},
+    scaled ticks=false,
+    axis lines=left,
+    width=5.5cm,
+    height=4.5cm, 
+    legend entries={Java},  
+    legend pos=north west,
+    legend cell align=left]
+\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
+\end{axis}
+\end{tikzpicture}
+\end{tabular}
 \end{center}
 
-\noindent This graph shows that Python needs approximately 29
+\noindent This first graph shows that Python needs approximately 29
 seconds for finding out whether a string of 28 \texttt{a}s
-matches the regular expression \texttt{[a?]\{28\}[a]\{28\}}.
+matches the regular expression \texttt{a?\{28\}\,a\{28\}}.
 Ruby is even slightly worse.\footnote{In this example Ruby
 uses the slightly different regular expression
 \texttt{a?a?a?...a?a?aaa...aa}, where the \texttt{a?} and
 \texttt{a} each occur $n$ times. More such test cases can be
-found at \url{http://www.computerbytesman.com/redos/}.}
-Admittedly, this regular expression is carefully chosen to
+found at \url{http://www.computerbytesman.com/redos/}.} Simlarly, Java needs approximately
+30 seconds to find out that the regular expression $\texttt{(a*)*\,b}$
+does not match strings of 28 \texttt{a}s.
+Admittedly, these regular expressions are carefully chosen to
 exhibit this exponential behaviour, but similar ones occur
 more often than one wants in ``real life''. For example, on 20
 July 2016 a similar regular expression brought the webpage
@@ -229,17 +259,17 @@
 \end{center}
 
 \noindent
-Such troublesome regular expressions are sometimes called
-\emph{evil regular expressions} because they have the
-potential to make regular expression matching engines to
-topple over, like in Python and Ruby. The problem with evil
-regular expressions is that they can have some serious
-consequences, for example, if you use them in your
-web-application. The reason is that hackers can look for these
-instances where the matching engine behaves badly and then
-mount a nice DoS-attack against your application. These
-attacks are already have their own name: \emph{Regular
-Expression Denial of Service Attacks (ReDoS)}.
+Such troublesome regular expressions are sometimes called \emph{evil
+  regular expressions} because they have the potential to make regular
+expression matching engines to topple over, like in Python, Ruby and
+Java. This `toopling over' is also sometimes called \emph{catastrophic
+  backtracking}.  The problem with evil regular expressions is that
+they can have some serious consequences, for example, if you use them
+in your web-application. The reason is that hackers can look for these
+instances where the matching engine behaves badly and then mount a
+nice DoS-attack against your application. These attacks are already
+have their own name: \emph{Regular Expression Denial of Service
+  Attacks (ReDoS)}.
 
 It will be instructive to look behind the ``scenes'' to find
 out why Python and Ruby (and others) behave so badly when
@@ -254,23 +284,53 @@
 \begin{center}
 \begin{tikzpicture}
   \begin{axis}[
-    xlabel={strings of \pcode{a}s},
+    title={Graph: $\texttt{a?\{n\}\,a{\{n\}}}$ and strings 
+           $\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$},
+    xlabel={$n$},
+    x label style={at={(1.05,0.0)}},
     ylabel={time in secs},
     enlargelimits=false,
     xtick={0,3000,...,12000},
-    xmax=12500,
-    ymax=35,
-    ytick={0,5,...,30},
+    xmax=13000,
+    ymax=12,
+    ytick={0,5,10},
     scaled ticks=false,
     axis lines=left,
-    width=9cm,
-    height=4.5cm]
-\addplot[green,mark=square*,mark options={fill=white}] table {re2b.data};
+    width=7cm,
+    height=4.5cm,
+    legend entries={Our Algorithm V1, Our Algorithm V2},
+    legend pos=outer north east]
+\addplot[green,mark=square*,mark options={fill=white}] table {re2.data};
 \addplot[black,mark=square*,mark options={fill=white}] table {re3.data};
 \end{axis}
 \end{tikzpicture}
 \end{center}
 
+\noindent And in the case of the regular expression $\texttt{(a*)*\,b}$
+and strings of \texttt{a}s we will beat Java by factor of
+approximately 1,000,000 (note the scale on the $x$-axis). 
+
+\begin{center}
+\begin{tikzpicture}
+  \begin{axis}[
+    title={Graph: $\texttt{(a*)*\,b}$ and strings 
+           $\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$},
+    xlabel={$n$},
+    x label style={at={(1.05,0.0)}},
+    ylabel={time in secs},
+    enlargelimits=false,
+    ymax=12,
+    ytick={0,5,10},
+    axis lines=left,
+    width=7cm,
+    height=4.5cm,
+    legend entries={Our Algorithm V2},
+    legend pos=outer north east]
+\addplot[black,mark=square*,mark options={fill=white}] table {re3a.data};
+\end{axis}
+\end{tikzpicture}
+\end{center}
+
 \subsection*{Basic Regular Expressions}
 
 The regular expressions shown earlier for Scala, we