diff -r 564f7584eff1 -r 245d302791c7 handouts/ho01.tex --- a/handouts/ho01.tex Wed Jun 01 13:04:06 2016 +0100 +++ b/handouts/ho01.tex Tue Jun 14 11:41:48 2016 +0100 @@ -34,7 +34,7 @@ This module is about text processing, be it for web-crawlers, compilers, dictionaries, DNA-data and so on. When looking for -a particular string in a large text we can use the +a particular string, like $abc$ in a large text we can use the Knuth-Morris-Pratt algorithm, which is currently the most efficient general string search algorithm. But often we do \emph{not} just look for a particular string, but for string @@ -67,14 +67,14 @@ \noindent where the first part matches one or more lowercase letters (\pcode{a-z}), digits (\pcode{0-9}), underscores, dots -or hyphens. The \pcode{+} ensures the ``one or more''. Then -comes the \pcode{@}-sign, followed by the domain name which -must be one or more lowercase letters, digits, underscores, -dots or hyphens. Note there cannot be an underscore in the -domain name. Finally there must be a dot followed by the -toplevel domain. This toplevel domain must be 2 to 6 lowercase -letters including the dot. Example strings which follow this -pattern are: +and hyphens. The \pcode{+} at the end of the brackets ensures +the ``one or more''. Then comes the \pcode{@}-sign, followed +by the domain name which must be one or more lowercase +letters, digits, underscores, dots or hyphens. Note there +cannot be an underscore in the domain name. Finally there must +be a dot followed by the toplevel domain. This toplevel domain +must be 2 to 6 lowercase letters including the dot. Example +strings which follow this pattern are: \begin{lstlisting}[language={},numbers=none,keywordstyle=\color{black}] niceandsimple@example.org @@ -113,10 +113,11 @@ are \pcode{x}, \pcode{foo}, \pcode{foo_bar_1}, \pcode{A_very_42_long_object_name}, but not \pcode{_i} and also not \pcode{4you}. -Many programming language offer libraries that can be used to +Many programming languages offer libraries that can be used to validate such strings against regular expressions. Also there are some common, and I am sure very familiar, ways of how to -construct regular expressions. For example in Scala we have: +construct regular expressions. For example in Scala we have +a library implementing the following regular expressions: \begin{center} \begin{tabular}{lp{9cm}} @@ -177,10 +178,11 @@ Regular expressions were introduced by Kleene in the 1950ies and they have been object of intense study since then. They -are nowadays pretty much ubiquitous in computer science. I am -sure you have come across them before. Why on earth then is -there any interest in studying them again in depth in this -module? Well, one answer is in the following graph about +are nowadays pretty much ubiquitous in computer science. There +are many libraries implementing regular expressions. I am sure +you have come across them before (remember PRA?). Why on earth +then is there any interest in studying them again in depth in +this module? Well, one answer is in the following graph about regular expression matching in Python and in Ruby. \begin{center} @@ -214,7 +216,7 @@ Ruby is even slightly worse.\footnote{In this example Ruby uses the slightly different regular expression \texttt{a?a?a?...a?a?aaa...aa}, where the \texttt{a?} and -\texttt{a} each occur $n$ times. More test results can be +\texttt{a} each occur $n$ times. More such test cases can be found at \url{http://www.computerbytesman.com/redos/}.} Admittedly, this regular expression is carefully chosen to exhibit this exponential behaviour, but similar ones occur @@ -285,7 +287,7 @@ \noindent Because we overload our notation, there are some subtleties you should be aware of. When regular expressions -are referred to then $\ZERO$ (in bold font) does not stand for +are referred to, then $\ZERO$ (in bold font) does not stand for the number zero: rather it is a particular pattern that does not match any string. Similarly, in the context of regular expressions, $\ONE$ does not stand for the number one but for @@ -368,7 +370,7 @@ If you prefer to think in terms of the implementation of regular expressions in Scala, the constructors and classes relate as follows\footnote{More about Scala is -in the handout about A Crash-Course on Scala.} +in the handout about \emph{A Crash-Course on Scala}.} \begin{center} \begin{tabular}{rcl} @@ -663,7 +665,8 @@ \end{minipage} \end{center} -\caption{Nothing that can be said this\ldots\label{monster}} +\caption{Nothing that can be said about this regular +expression\ldots\label{monster}} \end{figure}