afl-material: comparison handouts/ho01.tex

equal deleted inserted replaced

-:3a2fa69ea675
+:201c2c6d8696
 efficient general string search algorithm. But often we do
 \emph{not} just look for a particular string, but for string
 patterns. For example in programming code we need to identify
 what are the keywords, what are the identifiers etc. A pattern
 for identifiers could be stated as: they start with a letter,
-followed by zero or more letters, numbers and the underscore.
+followed by zero or more letters, numbers and underscores.
 Also often we face the problem that we are given a string (for
 example some user input) and want to know whether it matches a
 particular pattern. In this way we can exclude user input that
 would otherwise have nasty effects on our program (crashing it
-or going into an infinite loop, if not worse). \defn{Regular
+or making it go into an infinite loop, if not worse).
-expressions} help with conveniently specifying such patterns.
-The idea behind regular expressions is that they are a simple
+\defn{Regular expressions} help with conveniently specifying
-method for describing languages (or sets of strings)\ldots at
+such patterns. The idea behind regular expressions is that
-least languages we are interested in in computer science. For
+they are a simple method for describing languages (or sets of
-example there is no convenient regular expression for
+strings)\ldots at least languages we are interested in in
-describing the English language short of enumerating all
+computer science. For example there is no convenient regular
-English words. But they seem useful for describing for example
+expression for describing the English language short of
-email addresses.\footnote{See ``8 Regular Expressions You
+enumerating all English words. But they seem useful for
-Should Know'' \url{http://goo.gl/5LoVX7}} Consider the
+describing for example email addresses.\footnote{See ``8
-following regular expression
+Regular Expressions You Should Know''
+\url{http://goo.gl/5LoVX7}} Consider the following regular
+expression
 \begin{equation}\label{email}
 \texttt{[a-z0-9\_.-]+ @ [a-z0-9.-]+ . [a-z.]\{2,6\}}
 \end{equation}
 \noindent With this table you can figure out the purpose of
 the regular expressions in the web-crawlers shown Figures
 \ref{crawler1}, \ref{crawler2} and
 \ref{crawler3}.\footnote{There is an interesting twist in the
-web-scraber where \pcode{re*?} is used instead of \pcode{re*}.} Note,
+web-scraper where \pcode{re*?} is used instead of
-however, the regular expression for http-addresses in
+\pcode{re*}.} Note, however, the regular expression for
-web-pages is meant to be
+http-addresses in web-pages is meant to be
 \[
 \pcode{"https?://[^"]*"}
 \]
 there any interest in studying them again in depth in this
 module? Well, one answer is in the following graph about
 regular expression matching in Python and in Ruby.
 \begin{center}
-\begin{tikzpicture}[y=.09cm, x=.15cm]
+\begin{tikzpicture}
-	%axis
+\begin{axis}[xlabel={\pcode{a}s},ylabel={time in secs},
-	\draw (0,0) -- coordinate (x axis mid) (30,0);
+enlargelimits=false,
-\draw (0,0) -- coordinate (y axis mid) (0,30);
+xtick={0,5,...,30},
-%ticks
+xmax=33,
-\foreach \x in {0,5,...,30}
+ymax=35,
-\draw (\x,1pt) -- (\x,-3pt) node[anchor=north] {\x};
+ytick={0,5,...,30},
-\foreach \y in {0,5,...,30}
+scaled ticks=false,
-\draw (1pt,\y) -- (-3pt,\y) node[anchor=east] {\y};
+axis lines=left,
-	%labels
+width=7cm,
-	\node[below=0.6cm] at (x axis mid) {number of \texttt{a}s};
+height=5cm,
-	\node[rotate=90,left=0.9cm] at (y axis mid) {time in secs};
+legend entries={Python,Ruby},
-	%plots
+legend pos=north west,
-	\draw[color=blue] plot[mark=*]
+legend cell align=left]
-		file {re-python.data};
+\addplot[blue,mark=*, mark options={fill=white}]
-	\draw[color=brown] plot[mark=triangle*]
+table {re-python.data};
-		file {re-ruby.data};
+\addplot[brown,mark=triangle*, mark options={fill=white}]
-%legend
+table {re-ruby.data};
-	\begin{scope}[shift={(4,20)}]
+\end{axis}
-	\draw[color=blue] (0,0) --
-		plot[mark=*] (0.25,0) -- (0.5,0)
-		node[right]{\small Python};
-	\draw[yshift=-\baselineskip, color=brown] (0,0) --
-		plot[mark=triangle*] (0.25,0) -- (0.5,0)
-		node[right]{\small Ruby};
-	\end{scope}
 \end{tikzpicture}
 \end{center}
 \noindent This graph shows that Python needs approximately 29
 seconds for finding out whether a string of 28 \texttt{a}s
 consequences, for example, if you use them in your
 web-application. The reason is that hackers can look for these
 instances where the matching engine behaves badly and then
 mount a nice DoS-attack against your application. These
 attacks are already have their own name:
-\emph{Regular Expression Denial of Servive Attack (ReDoS)}.
+\emph{Regular Expression Denial of Service Attacks (ReDoS)}.
 It will be instructive to look behind the ``scenes'' to find
 out why Python and Ruby (and others) behave so badly when
 matching with evil regular expressions. But we will also look
 at a relatively simple algorithm that solves this problem much
 process strings of approximately 1,000 \texttt{a}s in 30
 seconds, while the second version will even be able to process
 up to 12,000 in less than 10(!) seconds, see the graph below:
 \begin{center}
-\begin{tikzpicture}[y=.09cm, x=.0006cm]
+\begin{tikzpicture}
-	%axis
+\begin{axis}[xlabel={\pcode{a}s},ylabel={time in secs},
-	\draw (0,0) -- coordinate (x axis mid) (12000,0);
+enlargelimits=false,
-\draw (0,0) -- coordinate (y axis mid) (0,30);
+xtick={0,3000,...,12000},
-%ticks
+xmax=12500,
-\foreach \x in {0,2000,...,12000}
+ymax=35,
-	\draw (\x,1pt) -- (\x,-3pt) node[anchor=north] {\x};
+ytick={0,5,...,30},
-\foreach \y in {0,5,...,30}
+scaled ticks=false,
-	\draw (1pt,\y) -- (-3pt,\y) node[anchor=east] {\y};
+axis lines=left,
-	%labels
+width=9cm,
-	\node[below=0.6cm] at (x axis mid) {number of \texttt{a}s};
+height=5cm]
-	\node[rotate=90,left=0.9cm] at (y axis mid) {time in secs};
+\addplot[green,mark=square*,mark options={fill=white}] table {re2b.data};
+\addplot[black,mark=square*,mark options={fill=white}] table {re3.data};
-	%plots
+\end{axis}
-\draw[color=green] plot[mark=square*, mark options={fill=white} ]
-		file {re2b.data};
-	\draw[color=black] plot[mark=square*, mark options={fill=white} ]
-		file {re3.data};
 \end{tikzpicture}
 \end{center}
 \subsection*{Basic Regular Expressions}

changeset 291	201c2c6d8696
parent 268	18bef085a7ca
child 306	fecffce112fa