\documentclass{article}+ −
\usepackage{../style}+ −
\usepackage{../langs}+ −
\usepackage{../graphics}+ −
+ −
+ −
\begin{document}+ −
+ −
\section*{Handout 4 (Sulzmann \& Lu Algorithm)}+ −
+ −
So far our algorithm based on derivatives was only able to say+ −
yes or no depending on whether a string was matched by regular+ −
expression or not. Often a more interesting question is to+ −
find out \emph{how} a regular expression matched a string?+ −
Answering this question will help us with the problem we are+ −
after, namely tokenising an input string, that is splitting it+ −
up into its ``word'' components. The algorithm we will be+ −
looking at was designed by Sulzmann \& Lu in a rather recent+ −
paper. A link to it is provided at KEATS, in case you are+ −
interested.\footnote{In my humble opinion this is an+ −
interesting instance of the research literature: it contains a+ −
very neat idea, but its presentation is rather sloppy.+ −
Students and I found several rather annoying typos in their+ −
examples and definitions.}+ −
+ −
In order to give an answer for how a regular expression matched+ −
a string, Sulzmann and Lu introduce \emph{values}. A value + −
will be the output of the algorithm whenever the regular + −
expression matches the string. If not an error will be raised.+ −
Since the first phase of the algorithm is identical to the+ −
derivative based matcher from the first coursework, the + −
function $nullable$ will be used to decide whether as string+ −
is matched by a regular expression.+ −
+ −
+ −
+ −
+ −
Algorithm by Sulzmann, Lexing+ −
+ −
\end{document}+ −
+ −
%%% Local Variables: + −
%%% mode: latex+ −
%%% TeX-master: t+ −
%%% End: + −