handouts/ho01.tex
changeset 395 e57d3d92b856
parent 363 0d6deecdb2eb
child 398 c8ce95067c1a
equal deleted inserted replaced
394:2f9fe225ecc8 395:e57d3d92b856
     5 \usepackage{../data}
     5 \usepackage{../data}
     6 
     6 
     7 %%http://regexcrossword.com/challenges/cities/puzzles/1
     7 %%http://regexcrossword.com/challenges/cities/puzzles/1
     8 
     8 
     9 \begin{document}
     9 \begin{document}
       
    10 \fnote{\copyright{} Christian Urban, 2014, 2015}
    10 
    11 
    11 \section*{Handout 1}
    12 \section*{Handout 1}
    12 
    13 
    13 This module is about text processing, be it for web-crawlers,
    14 This module is about text processing, be it for web-crawlers,
    14 compilers, dictionaries, DNA-data and so on. When looking for
    15 compilers, dictionaries, DNA-data and so on. When looking for
    15 a particular string in a large text we can use the
    16 a particular string in a large text we can use the
    16 Knuth-Morris-Pratt algorithm, which is currently the most
    17 Knuth-Morris-Pratt algorithm, which is currently the most
    17 efficient general string search algorithm. But often we do
    18 efficient general string search algorithm. But often we do
    18 \emph{not} just look for a particular string, but for string
    19 \emph{not} just look for a particular string, but for string
    19 patterns. For example in program code we need to identify
    20 patterns. For example in program code we need to identify what
    20 what are the keywords, what are the identifiers etc. A pattern
    21 are the keywords, what are the identifiers etc. A pattern for
    21 for identifiers could be stated as: they start with a letter,
    22 identifiers could be stated as: they start with a letter,
    22 followed by zero or more letters, numbers and underscores.
    23 followed by zero or more letters, numbers and underscores.
    23 Also often we face the problem that we are given a string (for
    24 Also often we face the problem that we are given a string (for
    24 example some user input) and want to know whether it matches a
    25 example some user input) and want to know whether it matches a
    25 particular pattern. In this way we can exclude user input that
    26 particular pattern. In this way we can, for example, exclude
    26 would otherwise have nasty effects on our program (crashing it
    27 user input that would otherwise have nasty effects on our
    27 or making it go into an infinite loop, if not worse).
    28 program (crashing it or making it go into an infinite loop, if
       
    29 not worse).
    28 
    30 
    29 \defn{Regular expressions} help with conveniently specifying
    31 \defn{Regular expressions} help with conveniently specifying
    30 such patterns. The idea behind regular expressions is that
    32 such patterns. The idea behind regular expressions is that
    31 they are a simple method for describing languages (or sets of
    33 they are a simple method for describing languages (or sets of
    32 strings)\ldots at least languages we are interested in in
    34 strings)\ldots at least languages we are interested in in