--- a/handouts/ho01.tex Wed Jul 17 11:17:52 2019 +0100
+++ b/handouts/ho01.tex Thu Jul 25 14:39:37 2019 +0100
@@ -36,13 +36,15 @@
% https://gcc.godbolt.org
%https://www.youtube.com/watch?v=gmhMQfFQu20
+
+
\begin{document}
\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2018, 2019}
\section*{Handout 1}
The first part of this module is about text processing, be it for
-web-crawlers, compilers, dictionaries, DNA-data, ad filters and so on.
+web-crawlers, compilers, dictionaries, DNA-data, spam-filters and so on.
When looking for a particular string, say \pcode{"foobar"}, in a large
text we can use the Knuth-Morris-Pratt algorithm, which is currently the
most efficient general string search algorithm. But often we do
@@ -59,14 +61,14 @@
Often we also face the problem that we are given a string, for example
some user input, and we want to know whether it matches a particular
-pattern---be it an email address, for example. In this way we can
+pattern---is it an email address, for example. In this way we can
exclude user input that would otherwise have nasty effects on our
program (crashing it or making it go into an infinite loop, if not
-worse). This kind of ``inspecting'' mechanism is implemented in popular
-network security tools such as Snort and
+worse). This kind of ``inspecting'' mechanism is also implemented in
+popular network security tools such as Snort and
Bro.\footnote{\url{www.snort.org}, \url{www.bro.org}} They scan incoming
-network traffic for computer viruses or malicious packets. Also
-filtering out spam usually involves scanning for some signature
+network traffic for computer viruses or malicious packets. Similarly
+filtering out spam usually involves looking for some signature
(essentially a string pattern). The point is that the fast
Knuth-Morris-Pratt algorithm for strings is not good enough for such
string \emph{patterns}.\smallskip
@@ -199,7 +201,7 @@
have been object of intense study since then. They are nowadays pretty
much ubiquitous in computer science. There are many libraries
implementing regular expressions. I am sure you have come across them
-before (remember the PRA module?).
+before (remember the PRA or PEP modules?).
Why on earth then is there any interest in studying them again in depth
in this module? Well, one answer is in the following two graphs about
@@ -323,15 +325,15 @@
attacks are already have their own name: \emph{Regular Expression
Denial of Service Attacks (ReDoS)}.
-It will be instructive to look behind the ``scenes'' to find
-out why Python and Ruby (and others) behave so badly when
-matching strings with evil regular expressions. But we will also look
-at a relatively simple algorithm that solves this problem much
-better than Python and Ruby do\ldots actually it will be two
-versions of the algorithm: the first one will be able to
-process strings of approximately 1,100 \texttt{a}s in 23
-seconds, while the second version will even be able to process
-up to 11,000(!) in 5 seconds, see the graph below:
+It will be instructive to look behind the ``scenes'' to find out why
+Python and Ruby (and others) behave so badly when matching strings with
+evil regular expressions. But we will also look at a relatively simple
+algorithm that solves this problem much better than Python and Ruby
+do\ldots actually it will be two versions of the algorithm: the first
+one will be able in the example \texttt{a?\{n\}\,a\{n\}} to process strings of
+approximately 1,100 \texttt{a}s in 23 seconds, while the second version
+will even be able to process up to 11,000(!) in 5 seconds, see the graph
+below:
\begin{center}
\begin{tikzpicture}
@@ -349,7 +351,7 @@
scaled ticks=false,
axis lines=left,
width=7cm,
- height=4.5cm,
+ height=4.4cm,
legend entries={Our Algorithm V1, Our Algorithm V2},
legend pos=outer north east]
\addplot[green,mark=square*,mark options={fill=white}] table {re2.data};