diff -r 3a2fa69ea675 -r 201c2c6d8696 handouts/ho01.tex --- a/handouts/ho01.tex Mon Oct 20 00:28:03 2014 +0100 +++ b/handouts/ho01.tex Tue Oct 28 06:01:00 2014 +0000 @@ -18,22 +18,24 @@ patterns. For example in programming code we need to identify what are the keywords, what are the identifiers etc. A pattern for identifiers could be stated as: they start with a letter, -followed by zero or more letters, numbers and the underscore. +followed by zero or more letters, numbers and underscores. Also often we face the problem that we are given a string (for example some user input) and want to know whether it matches a particular pattern. In this way we can exclude user input that would otherwise have nasty effects on our program (crashing it -or going into an infinite loop, if not worse). \defn{Regular -expressions} help with conveniently specifying such patterns. -The idea behind regular expressions is that they are a simple -method for describing languages (or sets of strings)\ldots at -least languages we are interested in in computer science. For -example there is no convenient regular expression for -describing the English language short of enumerating all -English words. But they seem useful for describing for example -email addresses.\footnote{See ``8 Regular Expressions You -Should Know'' \url{http://goo.gl/5LoVX7}} Consider the -following regular expression +or making it go into an infinite loop, if not worse). + +\defn{Regular expressions} help with conveniently specifying +such patterns. The idea behind regular expressions is that +they are a simple method for describing languages (or sets of +strings)\ldots at least languages we are interested in in +computer science. For example there is no convenient regular +expression for describing the English language short of +enumerating all English words. But they seem useful for +describing for example email addresses.\footnote{See ``8 +Regular Expressions You Should Know'' +\url{http://goo.gl/5LoVX7}} Consider the following regular +expression \begin{equation}\label{email} \texttt{[a-z0-9\_.-]+ @ [a-z0-9.-]+ . [a-z.]\{2,6\}} @@ -113,9 +115,9 @@ the regular expressions in the web-crawlers shown Figures \ref{crawler1}, \ref{crawler2} and \ref{crawler3}.\footnote{There is an interesting twist in the -web-scraber where \pcode{re*?} is used instead of \pcode{re*}.} Note, -however, the regular expression for http-addresses in -web-pages is meant to be +web-scraper where \pcode{re*?} is used instead of +\pcode{re*}.} Note, however, the regular expression for +http-addresses in web-pages is meant to be \[ \pcode{"https?://[^"]*"} @@ -149,32 +151,25 @@ regular expression matching in Python and in Ruby. \begin{center} -\begin{tikzpicture}[y=.09cm, x=.15cm] - %axis - \draw (0,0) -- coordinate (x axis mid) (30,0); - \draw (0,0) -- coordinate (y axis mid) (0,30); - %ticks - \foreach \x in {0,5,...,30} - \draw (\x,1pt) -- (\x,-3pt) node[anchor=north] {\x}; - \foreach \y in {0,5,...,30} - \draw (1pt,\y) -- (-3pt,\y) node[anchor=east] {\y}; - %labels - \node[below=0.6cm] at (x axis mid) {number of \texttt{a}s}; - \node[rotate=90,left=0.9cm] at (y axis mid) {time in secs}; - %plots - \draw[color=blue] plot[mark=*] - file {re-python.data}; - \draw[color=brown] plot[mark=triangle*] - file {re-ruby.data}; - %legend - \begin{scope}[shift={(4,20)}] - \draw[color=blue] (0,0) -- - plot[mark=*] (0.25,0) -- (0.5,0) - node[right]{\small Python}; - \draw[yshift=-\baselineskip, color=brown] (0,0) -- - plot[mark=triangle*] (0.25,0) -- (0.5,0) - node[right]{\small Ruby}; - \end{scope} +\begin{tikzpicture} +\begin{axis}[xlabel={\pcode{a}s},ylabel={time in secs}, + enlargelimits=false, + xtick={0,5,...,30}, + xmax=33, + ymax=35, + ytick={0,5,...,30}, + scaled ticks=false, + axis lines=left, + width=7cm, + height=5cm, + legend entries={Python,Ruby}, + legend pos=north west, + legend cell align=left] +\addplot[blue,mark=*, mark options={fill=white}] + table {re-python.data}; +\addplot[brown,mark=triangle*, mark options={fill=white}] + table {re-ruby.data}; +\end{axis} \end{tikzpicture} \end{center} @@ -198,7 +193,7 @@ instances where the matching engine behaves badly and then mount a nice DoS-attack against your application. These attacks are already have their own name: -\emph{Regular Expression Denial of Servive Attack (ReDoS)}. +\emph{Regular Expression Denial of Service Attacks (ReDoS)}. It will be instructive to look behind the ``scenes'' to find out why Python and Ruby (and others) behave so badly when @@ -211,24 +206,20 @@ up to 12,000 in less than 10(!) seconds, see the graph below: \begin{center} -\begin{tikzpicture}[y=.09cm, x=.0006cm] - %axis - \draw (0,0) -- coordinate (x axis mid) (12000,0); - \draw (0,0) -- coordinate (y axis mid) (0,30); - %ticks - \foreach \x in {0,2000,...,12000} - \draw (\x,1pt) -- (\x,-3pt) node[anchor=north] {\x}; - \foreach \y in {0,5,...,30} - \draw (1pt,\y) -- (-3pt,\y) node[anchor=east] {\y}; - %labels - \node[below=0.6cm] at (x axis mid) {number of \texttt{a}s}; - \node[rotate=90,left=0.9cm] at (y axis mid) {time in secs}; - - %plots - \draw[color=green] plot[mark=square*, mark options={fill=white} ] - file {re2b.data}; - \draw[color=black] plot[mark=square*, mark options={fill=white} ] - file {re3.data}; +\begin{tikzpicture} + \begin{axis}[xlabel={\pcode{a}s},ylabel={time in secs}, + enlargelimits=false, + xtick={0,3000,...,12000}, + xmax=12500, + ymax=35, + ytick={0,5,...,30}, + scaled ticks=false, + axis lines=left, + width=9cm, + height=5cm] +\addplot[green,mark=square*,mark options={fill=white}] table {re2b.data}; +\addplot[black,mark=square*,mark options={fill=white}] table {re3.data}; +\end{axis} \end{tikzpicture} \end{center}