1 \documentclass{article} |
1 \documentclass{article} |
2 \usepackage{../style} |
2 \usepackage{../style} |
3 \usepackage{../langs} |
3 \usepackage{../langs} |
4 |
4 \usepackage{../graphics} |
5 \usepackage{xcolor} |
|
6 \usepackage{tikz} |
|
7 \usetikzlibrary{arrows} |
|
8 \usetikzlibrary{automata} |
|
9 \usetikzlibrary{shapes} |
|
10 \usetikzlibrary{shadows} |
|
11 \usetikzlibrary{positioning} |
|
12 \usetikzlibrary{calc} |
|
13 \usetikzlibrary{fit} |
|
14 \usetikzlibrary{backgrounds} |
|
15 |
5 |
16 |
6 |
17 \begin{document} |
7 \begin{document} |
18 |
8 |
19 \section*{Handout 4} |
9 \section*{Handout 4 (Sulzmann \& Lu Algorithm)} |
|
10 |
|
11 So far our algorithm based on derivatives was only able to say |
|
12 yes or no depending on whether a string was matched by regular |
|
13 expression or not. Often a more interesting question is to |
|
14 find out \emph{how} a regular expression matched a string? |
|
15 Answering this question will help us with the problem we are |
|
16 after, namely tokenising an input string, that is splitting it |
|
17 up into its ``word'' components. The algorithm we will be |
|
18 looking at was designed by Sulzmann \& Lu in a rather recent |
|
19 paper. A link to it is provided at KEATS, in case you are |
|
20 interested.\footnote{In my humble opinion this is an |
|
21 interesting instance of the research literature: it contains a |
|
22 very neat idea, but its presentation is rather sloppy. |
|
23 Students and I found several rather annoying typos in their |
|
24 examples and definitions.} |
|
25 |
|
26 In order to give an answer for how a regular expression matched |
|
27 a string, Sulzmann and Lu introduce \emph{values}. A value |
|
28 will be the output of the algorithm whenever the regular |
|
29 expression matches the string. If not an error will be raised. |
|
30 Since the first phase of the algorithm is identical to the |
|
31 derivative based matcher from the first coursework, the |
|
32 function $nullable$ will be used to decide whether as string |
|
33 is matched by a regular expression. |
|
34 |
|
35 |
|
36 |
20 |
37 |
21 Algorithm by Sulzmann, Lexing |
38 Algorithm by Sulzmann, Lexing |
22 |
39 |
23 \end{document} |
40 \end{document} |
24 |
41 |