--- a/handouts/ho01.tex Sun Sep 14 14:21:59 2014 +0100
+++ b/handouts/ho01.tex Sun Sep 14 15:18:58 2014 +0100
@@ -516,6 +516,37 @@
corresponding implementations do not contain any bugs. We are
close, but not yet quite there.
+Despite my fascination, I am also happy to admit that regular
+expressions have their shortcomings. There are some well-known
+``theoretical shortcomings'', for example recognising strings
+of the form $a^{n}b^{n}$. I am not so bothered by them. What I
+am bothered about is when regular expressions are in the way
+of practical programming. For example, it turns out that the
+regular expression for email addresses shown in \eqref{email}
+is hopelessly inadequate for recognising all of them (despite
+being touted as something every computer scientist should know
+about). The W3 Consortium (which standardises the Web)
+proposes to use the following, already more complicated
+regular expressions:
+
+{\small\begin{lstlisting}[language={},keywordstyle=\color{black},numbers=none]
+[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*
+\end{lstlisting}}
+
+\noindent But they admit that by using this regular expression
+they wilfully violate the RFC 5322 standard which specifies
+the syntax of email addresses. With their proposed regular
+expression they are too strict in some cases and too lax in
+others. Not a good situation to be in. A regular expression
+that is claimed to be closer to the standard is shown in
+Figure~\ref{monster}. Whether this claim is true or not, I
+would not know---the only thing I can say it is a monstrosity.
+However, this might actually be an argument against the
+standard, rather than against regular expressions. Still it
+is good to know that some tasks in text processing just
+cannot be achieved by using regular expressions.
+
+
\begin{figure}[p]
\lstinputlisting{../progs/crawler1.scala}
\caption{The Scala code for a simple web-crawler that checks
@@ -549,6 +580,16 @@
printed.\label{crawler3}}
\end{figure}
+\begin{figure}[p]
+\tiny
+\begin{center}
+\begin{minipage}{0.8\textwidth}
+\lstinputlisting[language={},keywordstyle=\color{black},numbers=none]{../progs/email-rexp}
+\end{minipage}
+\end{center}
+
+\caption{Nothing that can be said\ldots\label{monster}}
+\end{figure}
\end{document}