diff -r 8d5aaf5b0031 -r 771042ac7c3f handouts/ho01.tex --- a/handouts/ho01.tex Sun Sep 14 14:21:59 2014 +0100 +++ b/handouts/ho01.tex Sun Sep 14 15:18:58 2014 +0100 @@ -516,6 +516,37 @@ corresponding implementations do not contain any bugs. We are close, but not yet quite there. +Despite my fascination, I am also happy to admit that regular +expressions have their shortcomings. There are some well-known +``theoretical shortcomings'', for example recognising strings +of the form $a^{n}b^{n}$. I am not so bothered by them. What I +am bothered about is when regular expressions are in the way +of practical programming. For example, it turns out that the +regular expression for email addresses shown in \eqref{email} +is hopelessly inadequate for recognising all of them (despite +being touted as something every computer scientist should know +about). The W3 Consortium (which standardises the Web) +proposes to use the following, already more complicated +regular expressions: + +{\small\begin{lstlisting}[language={},keywordstyle=\color{black},numbers=none] +[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)* +\end{lstlisting}} + +\noindent But they admit that by using this regular expression +they wilfully violate the RFC 5322 standard which specifies +the syntax of email addresses. With their proposed regular +expression they are too strict in some cases and too lax in +others. Not a good situation to be in. A regular expression +that is claimed to be closer to the standard is shown in +Figure~\ref{monster}. Whether this claim is true or not, I +would not know---the only thing I can say it is a monstrosity. +However, this might actually be an argument against the +standard, rather than against regular expressions. Still it +is good to know that some tasks in text processing just +cannot be achieved by using regular expressions. + + \begin{figure}[p] \lstinputlisting{../progs/crawler1.scala} \caption{The Scala code for a simple web-crawler that checks @@ -549,6 +580,16 @@ printed.\label{crawler3}} \end{figure} +\begin{figure}[p] +\tiny +\begin{center} +\begin{minipage}{0.8\textwidth} +\lstinputlisting[language={},keywordstyle=\color{black},numbers=none]{../progs/email-rexp} +\end{minipage} +\end{center} + +\caption{Nothing that can be said\ldots\label{monster}} +\end{figure} \end{document}