handouts/ho01.tex
changeset 473 dc528091eb70
parent 471 9476086849ad
child 477 b78664a24f5d
equal deleted inserted replaced
472:372f9801b76d 473:dc528091eb70
    48 zero or more letters, numbers and underscores.  Also often we face the
    48 zero or more letters, numbers and underscores.  Also often we face the
    49 problem that we are given a string (for example some user input) and
    49 problem that we are given a string (for example some user input) and
    50 want to know whether it matches a particular pattern---be it an email
    50 want to know whether it matches a particular pattern---be it an email
    51 address, for example. In this way we can exclude user input that would
    51 address, for example. In this way we can exclude user input that would
    52 otherwise have nasty effects on our program (crashing it or making it
    52 otherwise have nasty effects on our program (crashing it or making it
    53 go into an infinite loop, if not worse). The point is that the fast
    53 go into an infinite loop, if not worse). Scanning for computer viruses
       
    54 or filtering out spam usually involves scanning for some signature
       
    55 (essentially a pattern).  The point is that the fast
    54 Knuth-Morris-Pratt algorithm for strings is not good enough for such
    56 Knuth-Morris-Pratt algorithm for strings is not good enough for such
    55 string patterns.\smallskip
    57 string \emph{patterns}.\smallskip
    56 
    58 
    57 \defn{Regular expressions} help with conveniently specifying
    59 \defn{Regular expressions} help with conveniently specifying
    58 such patterns. The idea behind regular expressions is that
    60 such patterns. The idea behind regular expressions is that
    59 they are a simple method for describing languages (or sets of
    61 they are a simple method for describing languages (or sets of
    60 strings)\ldots at least languages we are interested in in
    62 strings)\ldots at least languages we are interested in in
   680 
   682 
   681 \noindent which explains some of the crazier parts of email
   683 \noindent which explains some of the crazier parts of email
   682 addresses. Still it is good to know that some tasks in text
   684 addresses. Still it is good to know that some tasks in text
   683 processing just cannot be achieved by using regular
   685 processing just cannot be achieved by using regular
   684 expressions. But for what we want to use them (lexing) they are
   686 expressions. But for what we want to use them (lexing) they are
   685 pretty good.
   687 pretty good.\medskip
       
   688 
       
   689 \noindent
       
   690 Finally there is a joke about regular expressions:
       
   691 
       
   692 \begin{quote}\it
       
   693   ``Sometimes you have a programming problem and it seems like the
       
   694   best solution is to use regular expressions; now you have two
       
   695   problems.''
       
   696 \end{quote}  
   686 
   697 
   687 
   698 
   688 \begin{figure}[p]
   699 \begin{figure}[p]
   689 \lstinputlisting{../progs/crawler1.scala}
   700 \lstinputlisting{../progs/crawler1.scala}
   690 
   701