diff -r a2c18456c6b7 -r 4755ad4b457b handouts/ho01.tex --- a/handouts/ho01.tex Fri Sep 25 17:39:02 2015 +0100 +++ b/handouts/ho01.tex Fri Sep 25 20:59:24 2015 +0100 @@ -119,8 +119,8 @@ \ref{crawler3}.\footnote{There is an interesting twist in the web-scraper where \pcode{re*?} is used instead of \pcode{re*}.} Note, however, the regular expression for -http-addresses in web-pages in Figure~\ref{crawler1}, Line 15, is -intended to be +http-addresses in web-pages in Figure~\ref{crawler1}, Line 15, +is intended to be \[ \pcode{"https?://[^"]*"} @@ -128,11 +128,12 @@ \noindent It specifies that web-addresses need to start with a double quote, then comes \texttt{http} followed by an optional -\texttt{s} and so on. Usually we would have to escape the -double quotes in order to make sure we interpret the double -quote as character, not as double quote for a string. But -Scala's trick with triple quotes allows us to omit this kind -of escaping. As a result we can just write: +\texttt{s} and so on until the closing double quote comes. +Usually we would have to escape the double quotes in order to +make sure we interpret the double quote as character, not as +double quote for a string. But Scala's trick with triple +quotes allows us to omit this kind of escaping. As a result we +can just write: \[ \pcode{""""https?://[^"]*"""".r} @@ -228,7 +229,7 @@ \subsection*{Basic Regular Expressions} -The regular expressions shown above, for example for Scala, we +The regular expressions shown above for Scala, we will call \emph{extended regular expressions}. The ones we will mainly study in this module are \emph{basic regular expressions}, which by convention we will just call @@ -521,7 +522,7 @@ specification and that the corresponding implementations do not contain any bugs. We are close, but not yet quite there. -My fascination non withstanding, I am also happy to admit that regular +Notwithstanding my fascination, I am also happy to admit that regular expressions have their shortcomings. There are some well-known ``theoretical'' shortcomings, for example recognising strings of the form $a^{n}b^{n}$. I am not so bothered by them. What I @@ -570,7 +571,7 @@ in ``my'' domain---since these are the ones I am interested in to fix. It uses the regular expression \texttt{my\_urls} in Line~16 to check for my name in the links. The main change is -in Lines~26--29 where there is a test whether URL is in ``my'' +in Lines~24--28 where there is a test whether URL is in ``my'' domain or not.\label{crawler2}} \end{figure} @@ -581,8 +582,8 @@ \caption{A small email harvester---whenever we download a web-page, we also check whether it contains any email addresses. For this we use the regular expression -\texttt{email\_pattern} in Line~16. The main change is in Line -32 where all email addresses that can be found in a page are +\texttt{email\_pattern} in Line~15. The main change is in Line +30 where all email addresses that can be found in a page are printed.\label{crawler3}} \end{figure}