--- a/handouts/ho01.tex Wed May 10 17:03:21 2017 +0100
+++ b/handouts/ho01.tex Sun May 21 00:43:02 2017 +0100
@@ -39,13 +39,13 @@
\section*{Handout 1}
This module is about text processing, be it for web-crawlers,
-compilers, dictionaries, DNA-data and so on. When looking for a
+compilers, dictionaries, DNA-data, ad filters and so on. When looking for a
particular string, like $abc$ in a large text we can use the
Knuth-Morris-Pratt algorithm, which is currently the most efficient
general string search algorithm. But often we do \emph{not} just look
for a particular string, but for string patterns. For example, in
program code we need to identify what are the keywords (if, then,
-while, etc), what are the identifiers (variable names). A pattern for
+while, for, etc), what are the identifiers (variable names). A pattern for
identifiers could be stated as: they start with a letter, followed by
zero or more letters, numbers and underscores. Often we also face the
problem that we are given a string (for example some user input) and
@@ -536,7 +536,7 @@
precisely specify when a string $s$ is matched by a regular
expression $r$, namely if and only if $s \in L(r)$. In fact we
will write a program \pcode{match} that takes any string $s$
-and any regular expression $r$ as argument and returns
+and any regular expression $r$ as arguments and returns
\emph{yes}, if $s \in L(r)$ and \emph{no}, if $s \not\in
L(r)$. We leave this for the next lecture.
@@ -641,9 +641,9 @@
Python, Ruby and Java in some instances and the problems in Stack
Exchange and the Atom editor). People who are not very familiar with
the mathematical background of regular expressions get them
-consistently wrong (surprising given they are a supposed to be core
-skill for computer scientists). The hope is that we can do better in
-the future---for example by proving that the algorithms actually
+consistently wrong (this is surprising given they are a supposed to be
+core skill for computer scientists). The hope is that we can do better
+in the future---for example by proving that the algorithms actually
satisfy their specification and that the corresponding implementations
do not contain any bugs. We are close, but not yet quite there.
@@ -652,15 +652,15 @@
``theoretical'' shortcomings, for example recognising strings of the
form $a^{n}b^{n}$ is not possible with regular expressions. This means
for example if we try to regognise whether parentheses are well-nested
-is impossible with (basic) regular expressions. I am not so bothered
-by these shortcomings. What I am bothered about is when regular
-expressions are in the way of practical programming. For example, it
-turns out that the regular expression for email addresses shown in
-\eqref{email} is hopelessly inadequate for recognising all of them
-(despite being touted as something every computer scientist should
-know about). The W3 Consortium (which standardises the Web) proposes
-to use the following, already more complicated regular expressions for
-email addresses:
+in an expression is impossible with (basic) regular expressions. I am
+not so bothered by these shortcomings. What I am bothered about is
+when regular expressions are in the way of practical programming. For
+example, it turns out that the regular expression for email addresses
+shown in \eqref{email} is hopelessly inadequate for recognising all of
+them (despite being touted as something every computer scientist
+should know about). The W3 Consortium (which standardises the Web)
+proposes to use the following, already more complicated regular
+expressions for email addresses:
{\small\begin{lstlisting}[language={},keywordstyle=\color{black},numbers=none]
[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*