Binary file handouts/ho01.pdf has changed
--- a/handouts/ho01.tex Mon Oct 20 00:28:03 2014 +0100
+++ b/handouts/ho01.tex Tue Oct 28 06:01:00 2014 +0000
@@ -18,22 +18,24 @@
patterns. For example in programming code we need to identify
what are the keywords, what are the identifiers etc. A pattern
for identifiers could be stated as: they start with a letter,
-followed by zero or more letters, numbers and the underscore.
+followed by zero or more letters, numbers and underscores.
Also often we face the problem that we are given a string (for
example some user input) and want to know whether it matches a
particular pattern. In this way we can exclude user input that
would otherwise have nasty effects on our program (crashing it
-or going into an infinite loop, if not worse). \defn{Regular
-expressions} help with conveniently specifying such patterns.
-The idea behind regular expressions is that they are a simple
-method for describing languages (or sets of strings)\ldots at
-least languages we are interested in in computer science. For
-example there is no convenient regular expression for
-describing the English language short of enumerating all
-English words. But they seem useful for describing for example
-email addresses.\footnote{See ``8 Regular Expressions You
-Should Know'' \url{http://goo.gl/5LoVX7}} Consider the
-following regular expression
+or making it go into an infinite loop, if not worse).
+
+\defn{Regular expressions} help with conveniently specifying
+such patterns. The idea behind regular expressions is that
+they are a simple method for describing languages (or sets of
+strings)\ldots at least languages we are interested in in
+computer science. For example there is no convenient regular
+expression for describing the English language short of
+enumerating all English words. But they seem useful for
+describing for example email addresses.\footnote{See ``8
+Regular Expressions You Should Know''
+\url{http://goo.gl/5LoVX7}} Consider the following regular
+expression
\begin{equation}\label{email}
\texttt{[a-z0-9\_.-]+ @ [a-z0-9.-]+ . [a-z.]\{2,6\}}
@@ -113,9 +115,9 @@
the regular expressions in the web-crawlers shown Figures
\ref{crawler1}, \ref{crawler2} and
\ref{crawler3}.\footnote{There is an interesting twist in the
-web-scraber where \pcode{re*?} is used instead of \pcode{re*}.} Note,
-however, the regular expression for http-addresses in
-web-pages is meant to be
+web-scraper where \pcode{re*?} is used instead of
+\pcode{re*}.} Note, however, the regular expression for
+http-addresses in web-pages is meant to be
\[
\pcode{"https?://[^"]*"}
@@ -149,32 +151,25 @@
regular expression matching in Python and in Ruby.
\begin{center}
-\begin{tikzpicture}[y=.09cm, x=.15cm]
- %axis
- \draw (0,0) -- coordinate (x axis mid) (30,0);
- \draw (0,0) -- coordinate (y axis mid) (0,30);
- %ticks
- \foreach \x in {0,5,...,30}
- \draw (\x,1pt) -- (\x,-3pt) node[anchor=north] {\x};
- \foreach \y in {0,5,...,30}
- \draw (1pt,\y) -- (-3pt,\y) node[anchor=east] {\y};
- %labels
- \node[below=0.6cm] at (x axis mid) {number of \texttt{a}s};
- \node[rotate=90,left=0.9cm] at (y axis mid) {time in secs};
- %plots
- \draw[color=blue] plot[mark=*]
- file {re-python.data};
- \draw[color=brown] plot[mark=triangle*]
- file {re-ruby.data};
- %legend
- \begin{scope}[shift={(4,20)}]
- \draw[color=blue] (0,0) --
- plot[mark=*] (0.25,0) -- (0.5,0)
- node[right]{\small Python};
- \draw[yshift=-\baselineskip, color=brown] (0,0) --
- plot[mark=triangle*] (0.25,0) -- (0.5,0)
- node[right]{\small Ruby};
- \end{scope}
+\begin{tikzpicture}
+\begin{axis}[xlabel={\pcode{a}s},ylabel={time in secs},
+ enlargelimits=false,
+ xtick={0,5,...,30},
+ xmax=33,
+ ymax=35,
+ ytick={0,5,...,30},
+ scaled ticks=false,
+ axis lines=left,
+ width=7cm,
+ height=5cm,
+ legend entries={Python,Ruby},
+ legend pos=north west,
+ legend cell align=left]
+\addplot[blue,mark=*, mark options={fill=white}]
+ table {re-python.data};
+\addplot[brown,mark=triangle*, mark options={fill=white}]
+ table {re-ruby.data};
+\end{axis}
\end{tikzpicture}
\end{center}
@@ -198,7 +193,7 @@
instances where the matching engine behaves badly and then
mount a nice DoS-attack against your application. These
attacks are already have their own name:
-\emph{Regular Expression Denial of Servive Attack (ReDoS)}.
+\emph{Regular Expression Denial of Service Attacks (ReDoS)}.
It will be instructive to look behind the ``scenes'' to find
out why Python and Ruby (and others) behave so badly when
@@ -211,24 +206,20 @@
up to 12,000 in less than 10(!) seconds, see the graph below:
\begin{center}
-\begin{tikzpicture}[y=.09cm, x=.0006cm]
- %axis
- \draw (0,0) -- coordinate (x axis mid) (12000,0);
- \draw (0,0) -- coordinate (y axis mid) (0,30);
- %ticks
- \foreach \x in {0,2000,...,12000}
- \draw (\x,1pt) -- (\x,-3pt) node[anchor=north] {\x};
- \foreach \y in {0,5,...,30}
- \draw (1pt,\y) -- (-3pt,\y) node[anchor=east] {\y};
- %labels
- \node[below=0.6cm] at (x axis mid) {number of \texttt{a}s};
- \node[rotate=90,left=0.9cm] at (y axis mid) {time in secs};
-
- %plots
- \draw[color=green] plot[mark=square*, mark options={fill=white} ]
- file {re2b.data};
- \draw[color=black] plot[mark=square*, mark options={fill=white} ]
- file {re3.data};
+\begin{tikzpicture}
+ \begin{axis}[xlabel={\pcode{a}s},ylabel={time in secs},
+ enlargelimits=false,
+ xtick={0,3000,...,12000},
+ xmax=12500,
+ ymax=35,
+ ytick={0,5,...,30},
+ scaled ticks=false,
+ axis lines=left,
+ width=9cm,
+ height=5cm]
+\addplot[green,mark=square*,mark options={fill=white}] table {re2b.data};
+\addplot[black,mark=square*,mark options={fill=white}] table {re3.data};
+\end{axis}
\end{tikzpicture}
\end{center}
Binary file handouts/ho02.pdf has changed
--- a/handouts/ho02.tex Mon Oct 20 00:28:03 2014 +0100
+++ b/handouts/ho02.tex Tue Oct 28 06:01:00 2014 +0000
@@ -23,7 +23,7 @@
\begin{axis}[xlabel={\pcode{a}s},ylabel={time in secs},
enlargelimits=false,
xtick={0,5,...,30},
- xmax=30,
+ xmax=33,
ymax=35,
ytick={0,5,...,30},
scaled ticks=false,
@@ -35,7 +35,7 @@
legend cell align=left]
\addplot[blue,mark=*, mark options={fill=white}]
table {re-python.data};
-\addplot[brown,mark=pentagon*, mark options={fill=white}]
+\addplot[brown,mark=triangle*, mark options={fill=white}]
table {re-ruby.data};
\end{axis}
\end{tikzpicture}
@@ -44,8 +44,9 @@
\begin{axis}[xlabel={\pcode{a}s},ylabel={time in secs},
enlargelimits=false,
xtick={0,3000,...,12000},
- xmax=12000,
- ymax=35,
ytick={0,5,...,30},
+ xmax=12500,
+ ymax=35,
+ ytick={0,5,...,30},
scaled ticks=false,
axis lines=left,
width=6.5cm,
@@ -291,16 +292,12 @@
Der\,c\,A\;\dn\;\{s\,|\,c\!::\!s \in A\}
\]
-\noindent
-which essentially transforms a set of strings $A$ by filtering out all
-strings that do not start with $c$ and then strips off the $c$ from
-all the remaining strings. For example suppose $A = \{f\!oo, bar,
-f\!rak\}$ then
-\[
-Der\,f\,A = \{oo, rak\}\quad,\quad
-Der\,b\,A = \{ar\} \quad \text{and} \quad
-Der\,a\,A = \varnothing
-\]
+\noindent This operation essentially transforms a set of
+strings $A$ by filtering out all strings that do not start
+with $c$ and then strips off the $c$ from all the remaining
+strings. For example suppose $A = \{f\!oo, bar, f\!rak\}$ then
+\[ Der\,f\,A = \{oo, rak\}\quad,\quad Der\,b\,A = \{ar\} \quad
+\text{and} \quad Der\,a\,A = \varnothing \]
\noindent
Note that in the last case $Der$ is empty, because no string in $A$