5 \usepackage{../data} |
5 \usepackage{../data} |
6 |
6 |
7 %%http://regexcrossword.com/challenges/cities/puzzles/1 |
7 %%http://regexcrossword.com/challenges/cities/puzzles/1 |
8 |
8 |
9 \begin{document} |
9 \begin{document} |
|
10 \fnote{\copyright{} Christian Urban, 2014, 2015} |
10 |
11 |
11 \section*{Handout 1} |
12 \section*{Handout 1} |
12 |
13 |
13 This module is about text processing, be it for web-crawlers, |
14 This module is about text processing, be it for web-crawlers, |
14 compilers, dictionaries, DNA-data and so on. When looking for |
15 compilers, dictionaries, DNA-data and so on. When looking for |
15 a particular string in a large text we can use the |
16 a particular string in a large text we can use the |
16 Knuth-Morris-Pratt algorithm, which is currently the most |
17 Knuth-Morris-Pratt algorithm, which is currently the most |
17 efficient general string search algorithm. But often we do |
18 efficient general string search algorithm. But often we do |
18 \emph{not} just look for a particular string, but for string |
19 \emph{not} just look for a particular string, but for string |
19 patterns. For example in program code we need to identify |
20 patterns. For example in program code we need to identify what |
20 what are the keywords, what are the identifiers etc. A pattern |
21 are the keywords, what are the identifiers etc. A pattern for |
21 for identifiers could be stated as: they start with a letter, |
22 identifiers could be stated as: they start with a letter, |
22 followed by zero or more letters, numbers and underscores. |
23 followed by zero or more letters, numbers and underscores. |
23 Also often we face the problem that we are given a string (for |
24 Also often we face the problem that we are given a string (for |
24 example some user input) and want to know whether it matches a |
25 example some user input) and want to know whether it matches a |
25 particular pattern. In this way we can exclude user input that |
26 particular pattern. In this way we can, for example, exclude |
26 would otherwise have nasty effects on our program (crashing it |
27 user input that would otherwise have nasty effects on our |
27 or making it go into an infinite loop, if not worse). |
28 program (crashing it or making it go into an infinite loop, if |
|
29 not worse). |
28 |
30 |
29 \defn{Regular expressions} help with conveniently specifying |
31 \defn{Regular expressions} help with conveniently specifying |
30 such patterns. The idea behind regular expressions is that |
32 such patterns. The idea behind regular expressions is that |
31 they are a simple method for describing languages (or sets of |
33 they are a simple method for describing languages (or sets of |
32 strings)\ldots at least languages we are interested in in |
34 strings)\ldots at least languages we are interested in in |