handouts/ho04.tex
changeset 936 0b5f06539a84
parent 874 ffe02fd574a5
child 941 66adcae6c762
equal deleted inserted replaced
935:4e221cf587fa 936:0b5f06539a84
    15 expression or not. Often a more interesting question is to
    15 expression or not. Often a more interesting question is to
    16 find out \emph{how} a regular expression matched a string?
    16 find out \emph{how} a regular expression matched a string?
    17 Answering this question will also help us with the problem we
    17 Answering this question will also help us with the problem we
    18 are after, namely tokenising an input string. 
    18 are after, namely tokenising an input string. 
    19 
    19 
    20 The algorithm we will be looking at in this lecture was designed by Sulzmann
    20 The algorithm we will be looking at in this lecture was designed by
    21 \& Lu in a rather recent research paper (from 2014). A link to it is
    21 Sulzmann \& Lu in a rather recent research paper (from 2014). A link
    22 provided on KEATS, in case you are interested.\footnote{In my
    22 to it is provided on KEATS, in case you are interested.\footnote{In my
    23 humble opinion this is an interesting instance of the research
    23   humble opinion this is an interesting instance of the research
    24 literature: it contains a very neat idea, but its presentation
    24   literature: it contains very clever ideas, but its presentation is
    25 is rather sloppy. In earlier versions of this paper, a King's
    25   rather sloppy. In earlier versions of this paper, students and I
    26 undergraduate student and I found several rather annoying typos in the
    26   found several rather annoying typos in the examples and definitions;
    27 examples and definitions.} My former PhD student Fahad Ausaf and I even more recently 
    27   we even found downright errors in their work.} Together with my former PhD
    28 wrote a paper where we build on their result: we provided a 
    28 students Fahad Ausaf and Chengsong Tan we wrote several papers where
    29 mathematical proof that their algorithm is really correct---the proof 
    29 we build on their result: we provided a mathematical proof that their
    30 Sulzmann \& Lu had originally given contained major flaws. Such correctness
    30 algorithm is really correct---the proof Sulzmann \& Lu had originally
    31 proofs are important: Kuklewicz maintains a unit-test library
    31 given contained major flaws. Such correctness proofs are important:
    32 for the kind of algorithms we are interested in here and he showed 
    32 Kuklewicz maintains a unit-test library for the kind of algorithms we
    33 that many implementations in the ``wild'' are buggy, that is not
    33 are interested in here and he showed that many implementations in the
    34 satisfy his unit tests:
    34 ``wild'' are buggy, that is not satisfy his unit tests:
    35 
    35 
    36 \begin{center}
    36 \begin{center}
    37 \url{http://www.haskell.org/haskellwiki/Regex_Posix}
    37 \url{http://www.haskell.org/haskellwiki/Regex_Posix}
    38 \end{center}
    38 \end{center}
    39 
    39 
   431 
   431 
   432 Phew\ldots{}Sweat\ldots!\#@$\skull$\%\ldots Unfortunately, there is
   432 Phew\ldots{}Sweat\ldots!\#@$\skull$\%\ldots Unfortunately, there is
   433 a gigantic problem with the described algorithm so far: it is very
   433 a gigantic problem with the described algorithm so far: it is very
   434 slow. To make it faster, we have to include in all this the simplification 
   434 slow. To make it faster, we have to include in all this the simplification 
   435 from Lecture 2\ldots{}and what rotten luck: simplification messes things 
   435 from Lecture 2\ldots{}and what rotten luck: simplification messes things 
   436 up and we need to rectify the mess. This is what we shall do next.
   436 up and we need to \emph{rectify} the mess. This is what we shall do next.
   437 
   437 
   438 
   438 
   439 \subsubsection*{Simplification}
   439 \subsubsection*{Simplification}
   440 
   440 
   441 Generally the matching algorithms based on derivatives do
   441 Generally the matching algorithms based on derivatives do