afl-material: comparison handouts/ho01.tex

equal deleted inserted replaced

-:d5776c6018f0
+:39b7ff2cf1bc
 \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017}
 \section*{Handout 1}
 This module is about text processing, be it for web-crawlers,
-compilers, dictionaries, DNA-data and so on.  When looking for a
+compilers, dictionaries, DNA-data, ad filters and so on.  When looking for a
 particular string, like $abc$ in a large text we can use the
 Knuth-Morris-Pratt algorithm, which is currently the most efficient
 general string search algorithm. But often we do \emph{not} just look
 for a particular string, but for string patterns. For example, in
 program code we need to identify what are the keywords (if, then,
-while, etc), what are the identifiers (variable names). A pattern for
+while, for, etc), what are the identifiers (variable names). A pattern for
 identifiers could be stated as: they start with a letter, followed by
 zero or more letters, numbers and underscores.  Often we also face the
 problem that we are given a string (for example some user input) and
 want to know whether it matches a particular pattern---be it an email
 address, for example. In this way we can exclude user input that would
 The point of the definition of $L$ is that we can use it to
 precisely specify when a string $s$ is matched by a regular
 expression $r$, namely if and only if $s \in L(r)$. In fact we
 will write a program \pcode{match} that takes any string $s$
-and any regular expression $r$ as argument and returns
+and any regular expression $r$ as arguments and returns
 \emph{yes}, if $s \in L(r)$ and \emph{no}, if $s \not\in
 L(r)$. We leave this for the next lecture.
 There is one more feature of regular expressions that is worth
 mentioning. Given some strings, there are in general many
 sometimes contains hidden snares. They have practical importance
 (remember the shocking runtime of the regular expression matchers in
 Python, Ruby and Java in some instances and the problems in Stack
 Exchange and the Atom editor).  People who are not very familiar with
 the mathematical background of regular expressions get them
-consistently wrong (surprising given they are a supposed to be core
+consistently wrong (this is surprising given they are a supposed to be
-skill for computer scientists). The hope is that we can do better in
+core skill for computer scientists). The hope is that we can do better
-the future---for example by proving that the algorithms actually
+in the future---for example by proving that the algorithms actually
 satisfy their specification and that the corresponding implementations
 do not contain any bugs. We are close, but not yet quite there.
 Notwithstanding my fascination, I am also happy to admit that regular
 expressions have their shortcomings. There are some well-known
 ``theoretical'' shortcomings, for example recognising strings of the
 form $a^{n}b^{n}$ is not possible with regular expressions. This means
 for example if we try to regognise whether parentheses are well-nested
-is impossible with (basic) regular expressions.  I am not so bothered
+in an expression is impossible with (basic) regular expressions.  I am
-by these shortcomings. What I am bothered about is when regular
+not so bothered by these shortcomings. What I am bothered about is
-expressions are in the way of practical programming. For example, it
+when regular expressions are in the way of practical programming. For
-turns out that the regular expression for email addresses shown in
+example, it turns out that the regular expression for email addresses
-\eqref{email} is hopelessly inadequate for recognising all of them
+shown in \eqref{email} is hopelessly inadequate for recognising all of
-(despite being touted as something every computer scientist should
+them (despite being touted as something every computer scientist
-know about). The W3 Consortium (which standardises the Web) proposes
+should know about). The W3 Consortium (which standardises the Web)
-to use the following, already more complicated regular expressions for
+proposes to use the following, already more complicated regular
-email addresses:
+expressions for email addresses:
 {\small\begin{lstlisting}[language={},keywordstyle=\color{black},numbers=none]
 [a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*
 \end{lstlisting}}

changeset 492	39b7ff2cf1bc
parent 477	b78664a24f5d
child 496	5c9de27a5b30