handouts/ho04.tex
changeset 319 e7b110f93697
parent 296 796b9b81ac8d
child 326 94700593a2d5
equal deleted inserted replaced
318:7975e4f0d4de 319:e7b110f93697
    11 So far our algorithm based on derivatives was only able to say
    11 So far our algorithm based on derivatives was only able to say
    12 yes or no depending on whether a string was matched by regular
    12 yes or no depending on whether a string was matched by regular
    13 expression or not. Often a more interesting question is to
    13 expression or not. Often a more interesting question is to
    14 find out \emph{how} a regular expression matched a string?
    14 find out \emph{how} a regular expression matched a string?
    15 Answering this question will also help us with the problem we
    15 Answering this question will also help us with the problem we
    16 are after, namely tokenising an input string. The algorithm we
    16 are after, namely tokenising an input string. 
    17 will be looking at for this was designed by Sulzmann \& Lu in
    17 
    18 a rather recent paper. A link to it is provided on KEATS, in
    18 The algorithm we will be looking at was designed by Sulzmann \& Lu in
    19 case you are interested.\footnote{In my humble opinion this is
    19 a rather recent paper. A link to it is provided on KEATS, in case you
    20 an interesting instance of the research literature: it
    20 are interested.\footnote{In my humble opinion this is an interesting
    21 contains a very neat idea, but its presentation is rather
    21   instance of the research literature: it contains a very neat idea,
    22 sloppy. In earlier versions of their paper, students and I
    22   but its presentation is rather sloppy. In earlier versions of their
    23 found several rather annoying typos in their examples and
    23   paper, students and I found several rather annoying typos in their
    24 definitions.}
    24   examples and definitions.}  In order to give an answer for how a
    25 
    25 regular expression matched a string, Sulzmann and Lu introduce
    26 In order to give an answer for how a regular expression
    26 \emph{values}. A value will be the output of the algorithm whenever
    27 matched a string, Sulzmann and Lu introduce \emph{values}. A
    27 the regular expression matches the string. If the string does not
    28 value will be the output of the algorithm whenever the regular
    28 match the string, an error will be raised. Since the first phase of
    29 expression matches the string. If not, an error will be
    29 the algorithm by Sulzmann \& Lu is identical to the derivative based
    30 raised. Since the first phase of the algorithm by Sulzmann \&
    30 matcher from the first coursework, the function $nullable$ will be
    31 Lu is identical to the derivative based matcher from the first
    31 used to decide whether as string is matched by a regular
    32 coursework, the function $nullable$ will be used to decide
    32 expression. If $nullable$ says yes, then values are constructed that
    33 whether as string is matched by a regular expression. If
    33 reflect how the regular expression matched the string. The definitions
    34 $nullable$ says yes, then values are constructed that reflect
    34 for regular expressions $r$ and values $v$ is shown next to each other
    35 how the regular expression matched the string. The definitions
    35 below:
    36 for regular expressions $r$ and values $v$ is shown next to
       
    37 each other below:
       
    38 
    36 
    39 \begin{center}
    37 \begin{center}
    40 \begin{tabular}{cc}
    38 \begin{tabular}{cc}
    41 \begin{tabular}{@{}rrl@{}}
    39 \begin{tabular}{@{}rrl@{}}
    42 \multicolumn{3}{c}{regular expressions}\\
    40 \multicolumn{3}{c}{regular expressions}\medskip\\
    43   $r$ & $::=$  & $\varnothing$\\
    41   $r$ & $::=$  & $\varnothing$\\
    44       & $\mid$ & $\epsilon$   \\
    42       & $\mid$ & $\epsilon$   \\
    45       & $\mid$ & $c$          \\
    43       & $\mid$ & $c$          \\
    46       & $\mid$ & $r_1 \cdot r_2$\\
    44       & $\mid$ & $r_1 \cdot r_2$\\
    47       & $\mid$ & $r_1 + r_2$   \\
    45       & $\mid$ & $r_1 + r_2$   \\
    48   \\
    46   \\
    49       & $\mid$ & $r^*$         \\
    47       & $\mid$ & $r^*$         \\
    50 \end{tabular}
    48 \end{tabular}
    51 &
    49 &
    52 \begin{tabular}{@{\hspace{0mm}}rrl@{}}
    50 \begin{tabular}{@{\hspace{0mm}}rrl@{}}
    53 \multicolumn{3}{c}{values}\\
    51 \multicolumn{3}{c}{values}\medskip\\
    54    $v$ & $::=$  & \\
    52    $v$ & $::=$  & \\
    55       &        & $Empty$   \\
    53       &        & $Empty$   \\
    56       & $\mid$ & $Char(c)$          \\
    54       & $\mid$ & $Char(c)$          \\
    57       & $\mid$ & $Seq(v_1,v_2)$\\
    55       & $\mid$ & $Seq(v_1,v_2)$\\
    58       & $\mid$ & $Left(v)$   \\
    56       & $\mid$ & $Left(v)$   \\