afl-material: comparison handouts/ho01.tex

equal deleted inserted replaced

-:564f7584eff1
+:245d302791c7
 \section*{Handout 1}
 This module is about text processing, be it for web-crawlers,
 compilers, dictionaries, DNA-data and so on. When looking for
-a particular string in a large text we can use the
+a particular string, like $abc$ in a large text we can use the
 Knuth-Morris-Pratt algorithm, which is currently the most
 efficient general string search algorithm. But often we do
 \emph{not} just look for a particular string, but for string
 patterns. For example in program code we need to identify what
 are the keywords, what are the identifiers etc. A pattern for
 \texttt{[a-z0-9\_.-]+ @ [a-z0-9.-]+ . [a-z.]\{2,6\}}
 \end{equation}
 \noindent where the first part matches one or more lowercase
 letters (\pcode{a-z}), digits (\pcode{0-9}), underscores, dots
-or hyphens. The \pcode{+} ensures the ``one or more''. Then
+and hyphens. The \pcode{+} at the end of the brackets ensures
-comes the \pcode{@}-sign, followed by the domain name which
+the ``one or more''. Then comes the \pcode{@}-sign, followed
-must be one or more lowercase letters, digits, underscores,
+by the domain name which must be one or more lowercase
-dots or hyphens. Note there cannot be an underscore in the
+letters, digits, underscores, dots or hyphens. Note there
-domain name. Finally there must be a dot followed by the
+cannot be an underscore in the domain name. Finally there must
-toplevel domain. This toplevel domain must be 2 to 6 lowercase
+be a dot followed by the toplevel domain. This toplevel domain
-letters including the dot. Example strings which follow this
+must be 2 to 6 lowercase letters including the dot. Example
-pattern are:
+strings which follow this pattern are:
 \begin{lstlisting}[language={},numbers=none,keywordstyle=\color{black}]
 niceandsimple@example.org
 very.common@example.co.uk
 a.little.lengthy.but.fine@dept.example.ac.uk
 \noindent Possible identifiers that match this regular expression
 are \pcode{x}, \pcode{foo}, \pcode{foo_bar_1}, \pcode{A_very_42_long_object_name},
 but not \pcode{_i} and also not \pcode{4you}.
-Many programming language offer libraries that can be used to
+Many programming languages offer libraries that can be used to
 validate such strings against regular expressions. Also there
 are some common, and I am sure very familiar, ways of how to
-construct regular expressions. For example in Scala we have:
+construct regular expressions. For example in Scala we have
+a library implementing the following regular expressions:
 \begin{center}
 \begin{tabular}{lp{9cm}}
 \pcode{re*} & matches 0 or more occurrences of preceding
 expression\\
 \subsection*{Why Study Regular Expressions?}
 Regular expressions were introduced by Kleene in the 1950ies
 and they have been object of intense study since then. They
-are nowadays pretty much ubiquitous in computer science. I am
+are nowadays pretty much ubiquitous in computer science. There
-sure you have come across them before. Why on earth then is
+are many libraries implementing regular expressions. I am sure
-there any interest in studying them again in depth in this
+you have come across them before (remember PRA?). Why on earth
-module? Well, one answer is in the following graph about
+then is there any interest in studying them again in depth in
+this module? Well, one answer is in the following graph about
 regular expression matching in Python and in Ruby.
 \begin{center}
 \begin{tikzpicture}
 \begin{axis}[
 seconds for finding out whether a string of 28 \texttt{a}s
 matches the regular expression \texttt{[a?]\{28\}[a]\{28\}}.
 Ruby is even slightly worse.\footnote{In this example Ruby
 uses the slightly different regular expression
 \texttt{a?a?a?...a?a?aaa...aa}, where the \texttt{a?} and
-\texttt{a} each occur $n$ times. More test results can be
+\texttt{a} each occur $n$ times. More such test cases can be
 found at \url{http://www.computerbytesman.com/redos/}.}
 Admittedly, this regular expression is carefully chosen to
 exhibit this exponential behaviour, but similar ones occur
 more often than one wants in ``real life''. They are sometimes
 called \emph{evil regular expressions} because they have the
 \end{tabular}
 \end{center}
 \noindent Because we overload our notation, there are some
 subtleties you should be aware of. When regular expressions
-are referred to then $\ZERO$ (in bold font) does not stand for
+are referred to, then $\ZERO$ (in bold font) does not stand for
 the number zero: rather it is a particular pattern that does
 not match any string. Similarly, in the context of regular
 expressions, $\ONE$ does not stand for the number one but for
 a regular expression that matches the empty string. The letter
 $c$ stands for any character from the alphabet at hand. Again
 but often just write {\it hello}.
 If you prefer to think in terms of the implementation
 of regular expressions in Scala, the constructors and
 classes relate as follows\footnote{More about Scala is
-in the handout about A Crash-Course on Scala.}
+in the handout about \emph{A Crash-Course on Scala}.}
 \begin{center}
 \begin{tabular}{rcl}
 $\ZERO$       & $\mapsto$ & \texttt{ZERO}\\
 $\ONE$        & $\mapsto$ & \texttt{ONE}\\
 \begin{minipage}{0.8\textwidth}
 \lstinputlisting[language={},keywordstyle=\color{black},numbers=none]{../progs/email-rexp}
 \end{minipage}
 \end{center}
-\caption{Nothing that can be said this\ldots\label{monster}}
+\caption{Nothing that can be said about this regular
+expression\ldots\label{monster}}
 \end{figure}
 \end{document}

changeset 404	245d302791c7
parent 403	564f7584eff1
child 407	4b454a6d1814