--- a/handouts/ho01.tex Fri Sep 25 17:39:02 2015 +0100
+++ b/handouts/ho01.tex Fri Sep 25 20:59:24 2015 +0100
@@ -119,8 +119,8 @@
\ref{crawler3}.\footnote{There is an interesting twist in the
web-scraper where \pcode{re*?} is used instead of
\pcode{re*}.} Note, however, the regular expression for
-http-addresses in web-pages in Figure~\ref{crawler1}, Line 15, is
-intended to be
+http-addresses in web-pages in Figure~\ref{crawler1}, Line 15,
+is intended to be
\[
\pcode{"https?://[^"]*"}
@@ -128,11 +128,12 @@
\noindent It specifies that web-addresses need to start with a
double quote, then comes \texttt{http} followed by an optional
-\texttt{s} and so on. Usually we would have to escape the
-double quotes in order to make sure we interpret the double
-quote as character, not as double quote for a string. But
-Scala's trick with triple quotes allows us to omit this kind
-of escaping. As a result we can just write:
+\texttt{s} and so on until the closing double quote comes.
+Usually we would have to escape the double quotes in order to
+make sure we interpret the double quote as character, not as
+double quote for a string. But Scala's trick with triple
+quotes allows us to omit this kind of escaping. As a result we
+can just write:
\[
\pcode{""""https?://[^"]*"""".r}
@@ -228,7 +229,7 @@
\subsection*{Basic Regular Expressions}
-The regular expressions shown above, for example for Scala, we
+The regular expressions shown above for Scala, we
will call \emph{extended regular expressions}. The ones we
will mainly study in this module are \emph{basic regular
expressions}, which by convention we will just call
@@ -521,7 +522,7 @@
specification and that the corresponding implementations do
not contain any bugs. We are close, but not yet quite there.
-My fascination non withstanding, I am also happy to admit that regular
+Notwithstanding my fascination, I am also happy to admit that regular
expressions have their shortcomings. There are some well-known
``theoretical'' shortcomings, for example recognising strings
of the form $a^{n}b^{n}$. I am not so bothered by them. What I
@@ -570,7 +571,7 @@
in ``my'' domain---since these are the ones I am interested in
to fix. It uses the regular expression \texttt{my\_urls} in
Line~16 to check for my name in the links. The main change is
-in Lines~26--29 where there is a test whether URL is in ``my''
+in Lines~24--28 where there is a test whether URL is in ``my''
domain or not.\label{crawler2}}
\end{figure}
@@ -581,8 +582,8 @@
\caption{A small email harvester---whenever we download a
web-page, we also check whether it contains any email
addresses. For this we use the regular expression
-\texttt{email\_pattern} in Line~16. The main change is in Line
-32 where all email addresses that can be found in a page are
+\texttt{email\_pattern} in Line~15. The main change is in Line
+30 where all email addresses that can be found in a page are
printed.\label{crawler3}}
\end{figure}