updated
authorChristian Urban <urbanc@in.tum.de>
Thu, 25 Jul 2019 14:39:37 +0100
changeset 622 b47e140bcccd
parent 621 cf287db8dc15
child 623 47a299e7010f
updated
handouts/ho01.pdf
handouts/ho01.tex
Binary file handouts/ho01.pdf has changed
--- a/handouts/ho01.tex	Wed Jul 17 11:17:52 2019 +0100
+++ b/handouts/ho01.tex	Thu Jul 25 14:39:37 2019 +0100
@@ -36,13 +36,15 @@
 % https://gcc.godbolt.org
 
 %https://www.youtube.com/watch?v=gmhMQfFQu20
+
+
 \begin{document}
 \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2018, 2019}
 
 \section*{Handout 1}
 
 The first part of this module is about text processing, be it for
-web-crawlers, compilers, dictionaries, DNA-data, ad filters and so on.
+web-crawlers, compilers, dictionaries, DNA-data, spam-filters and so on.
 When looking for a particular string, say \pcode{"foobar"}, in a large
 text we can use the Knuth-Morris-Pratt algorithm, which is currently the
 most efficient general string search algorithm. But often we do
@@ -59,14 +61,14 @@
 
 Often we also face the problem that we are given a string, for example
 some user input, and we want to know whether it matches a particular
-pattern---be it an email address, for example. In this way we can
+pattern---is it an email address, for example. In this way we can
 exclude user input that would otherwise have nasty effects on our
 program (crashing it or making it go into an infinite loop, if not
-worse). This kind of ``inspecting'' mechanism is implemented in popular
-network security tools such as Snort and
+worse). This kind of ``inspecting'' mechanism is also implemented in
+popular network security tools such as Snort and
 Bro.\footnote{\url{www.snort.org}, \url{www.bro.org}} They scan incoming
-network traffic for computer viruses or malicious packets. Also
-filtering out spam usually involves scanning for some signature
+network traffic for computer viruses or malicious packets. Similarly
+filtering out spam usually involves looking for some signature
 (essentially a string pattern). The point is that the fast
 Knuth-Morris-Pratt algorithm for strings is not good enough for such
 string \emph{patterns}.\smallskip
@@ -199,7 +201,7 @@
 have been object of intense study since then. They are nowadays pretty
 much ubiquitous in computer science. There are many libraries
 implementing regular expressions. I am sure you have come across them
-before (remember the PRA module?). 
+before (remember the PRA or PEP modules?). 
 
 Why on earth then is there any interest in studying them again in depth
 in this module? Well, one answer is in the following two graphs about
@@ -323,15 +325,15 @@
   attacks are already have their own name: \emph{Regular Expression
   Denial of Service Attacks (ReDoS)}.
 
-It will be instructive to look behind the ``scenes'' to find
-out why Python and Ruby (and others) behave so badly when
-matching strings with evil regular expressions. But we will also look
-at a relatively simple algorithm that solves this problem much
-better than Python and Ruby do\ldots actually it will be two
-versions of the algorithm: the first one will be able to
-process strings of approximately 1,100 \texttt{a}s in 23 
-seconds, while the second version will even be able to process
-up to 11,000(!) in 5 seconds, see the graph below:
+It will be instructive to look behind the ``scenes'' to find out why
+Python and Ruby (and others) behave so badly when matching strings with
+evil regular expressions. But we will also look at a relatively simple
+algorithm that solves this problem much better than Python and Ruby
+do\ldots actually it will be two versions of the algorithm: the first
+one will be able in  the example \texttt{a?\{n\}\,a\{n\}} to process strings of
+approximately 1,100 \texttt{a}s in 23 seconds, while the second version
+will even be able to process up to 11,000(!) in 5 seconds, see the graph
+below:
 
 \begin{center}
 \begin{tikzpicture}
@@ -349,7 +351,7 @@
     scaled ticks=false,
     axis lines=left,
     width=7cm,
-    height=4.5cm,
+    height=4.4cm,
     legend entries={Our Algorithm V1, Our Algorithm V2},
     legend pos=outer north east]
 \addplot[green,mark=square*,mark options={fill=white}] table {re2.data};