11 So far our algorithm based on derivatives was only able to say |
11 So far our algorithm based on derivatives was only able to say |
12 yes or no depending on whether a string was matched by regular |
12 yes or no depending on whether a string was matched by regular |
13 expression or not. Often a more interesting question is to |
13 expression or not. Often a more interesting question is to |
14 find out \emph{how} a regular expression matched a string? |
14 find out \emph{how} a regular expression matched a string? |
15 Answering this question will also help us with the problem we |
15 Answering this question will also help us with the problem we |
16 are after, namely tokenising an input string. The algorithm we |
16 are after, namely tokenising an input string. |
17 will be looking at for this was designed by Sulzmann \& Lu in |
17 |
18 a rather recent paper. A link to it is provided on KEATS, in |
18 The algorithm we will be looking at was designed by Sulzmann \& Lu in |
19 case you are interested.\footnote{In my humble opinion this is |
19 a rather recent paper. A link to it is provided on KEATS, in case you |
20 an interesting instance of the research literature: it |
20 are interested.\footnote{In my humble opinion this is an interesting |
21 contains a very neat idea, but its presentation is rather |
21 instance of the research literature: it contains a very neat idea, |
22 sloppy. In earlier versions of their paper, students and I |
22 but its presentation is rather sloppy. In earlier versions of their |
23 found several rather annoying typos in their examples and |
23 paper, students and I found several rather annoying typos in their |
24 definitions.} |
24 examples and definitions.} In order to give an answer for how a |
25 |
25 regular expression matched a string, Sulzmann and Lu introduce |
26 In order to give an answer for how a regular expression |
26 \emph{values}. A value will be the output of the algorithm whenever |
27 matched a string, Sulzmann and Lu introduce \emph{values}. A |
27 the regular expression matches the string. If the string does not |
28 value will be the output of the algorithm whenever the regular |
28 match the string, an error will be raised. Since the first phase of |
29 expression matches the string. If not, an error will be |
29 the algorithm by Sulzmann \& Lu is identical to the derivative based |
30 raised. Since the first phase of the algorithm by Sulzmann \& |
30 matcher from the first coursework, the function $nullable$ will be |
31 Lu is identical to the derivative based matcher from the first |
31 used to decide whether as string is matched by a regular |
32 coursework, the function $nullable$ will be used to decide |
32 expression. If $nullable$ says yes, then values are constructed that |
33 whether as string is matched by a regular expression. If |
33 reflect how the regular expression matched the string. The definitions |
34 $nullable$ says yes, then values are constructed that reflect |
34 for regular expressions $r$ and values $v$ is shown next to each other |
35 how the regular expression matched the string. The definitions |
35 below: |
36 for regular expressions $r$ and values $v$ is shown next to |
|
37 each other below: |
|
38 |
36 |
39 \begin{center} |
37 \begin{center} |
40 \begin{tabular}{cc} |
38 \begin{tabular}{cc} |
41 \begin{tabular}{@{}rrl@{}} |
39 \begin{tabular}{@{}rrl@{}} |
42 \multicolumn{3}{c}{regular expressions}\\ |
40 \multicolumn{3}{c}{regular expressions}\medskip\\ |
43 $r$ & $::=$ & $\varnothing$\\ |
41 $r$ & $::=$ & $\varnothing$\\ |
44 & $\mid$ & $\epsilon$ \\ |
42 & $\mid$ & $\epsilon$ \\ |
45 & $\mid$ & $c$ \\ |
43 & $\mid$ & $c$ \\ |
46 & $\mid$ & $r_1 \cdot r_2$\\ |
44 & $\mid$ & $r_1 \cdot r_2$\\ |
47 & $\mid$ & $r_1 + r_2$ \\ |
45 & $\mid$ & $r_1 + r_2$ \\ |
48 \\ |
46 \\ |
49 & $\mid$ & $r^*$ \\ |
47 & $\mid$ & $r^*$ \\ |
50 \end{tabular} |
48 \end{tabular} |
51 & |
49 & |
52 \begin{tabular}{@{\hspace{0mm}}rrl@{}} |
50 \begin{tabular}{@{\hspace{0mm}}rrl@{}} |
53 \multicolumn{3}{c}{values}\\ |
51 \multicolumn{3}{c}{values}\medskip\\ |
54 $v$ & $::=$ & \\ |
52 $v$ & $::=$ & \\ |
55 & & $Empty$ \\ |
53 & & $Empty$ \\ |
56 & $\mid$ & $Char(c)$ \\ |
54 & $\mid$ & $Char(c)$ \\ |
57 & $\mid$ & $Seq(v_1,v_2)$\\ |
55 & $\mid$ & $Seq(v_1,v_2)$\\ |
58 & $\mid$ & $Left(v)$ \\ |
56 & $\mid$ & $Left(v)$ \\ |