handouts/ho04.tex
changeset 520 2849c305b12d
parent 492 39b7ff2cf1bc
child 551 bd551ede2be6
equal deleted inserted replaced
519:955d5b3b0619 520:2849c305b12d
     3 \usepackage{../langs}
     3 \usepackage{../langs}
     4 \usepackage{../graphics}
     4 \usepackage{../graphics}
     5 
     5 
     6 
     6 
     7 \begin{document}
     7 \begin{document}
     8 \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016}
     8 \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017}
     9 
     9 
    10 \section*{Handout 4 (Sulzmann \& Lu Algorithm)}
    10 \section*{Handout 4 (Sulzmann \& Lu Algorithm)}
    11 
    11 
    12 So far our algorithm based on derivatives was only able to say
    12 So far our algorithm based on derivatives was only able to say
    13 yes or no depending on whether a string was matched by a regular
    13 yes or no depending on whether a string was matched by a regular
    26 examples and definitions.} My PhD student Fahad Ausaf and I even more recently 
    26 examples and definitions.} My PhD student Fahad Ausaf and I even more recently 
    27 wrote a paper where we build on their result: we provided a 
    27 wrote a paper where we build on their result: we provided a 
    28 mathematical proof that their algorithm is really correct---the proof 
    28 mathematical proof that their algorithm is really correct---the proof 
    29 Sulzmann \& Lu had originally given contained major flaws. Such correctness
    29 Sulzmann \& Lu had originally given contained major flaws. Such correctness
    30 proofs are important: Kuklewicz maintains a unit-test library
    30 proofs are important: Kuklewicz maintains a unit-test library
    31 for the kind of algorithma we are interested in here and he showed 
    31 for the kind of algorithms we are interested in here and he showed 
    32 that many implementations in the ``wild'' are buggy, that is not
    32 that many implementations in the ``wild'' are buggy, that is not
    33 satisfy his unit tests:
    33 satisfy his unit tests:
    34 
    34 
    35 \begin{center}
    35 \begin{center}
    36 \url{http://www.haskell.org/haskellwiki/Regex_Posix}
    36 \url{http://www.haskell.org/haskellwiki/Regex_Posix}
    79 each regular expression with the exception of $r_1 + r_2$
    79 each regular expression with the exception of $r_1 + r_2$
    80 where there are two values, namely $\Left(v)$ and $Right(v)$
    80 where there are two values, namely $\Left(v)$ and $Right(v)$
    81 corresponding to the two alternatives. Note that $r^*$ is
    81 corresponding to the two alternatives. Note that $r^*$ is
    82 associated with a list of values, one for each copy of $r$
    82 associated with a list of values, one for each copy of $r$
    83 that was needed to match the string. This means we might also
    83 that was needed to match the string. This means we might also
    84 return the empty list $Stars []$, if no copy was needed in case
    84 return the empty list $Stars []$, if no copy was needed
    85 of $r^*$. For sequence, there is exactly one value, composed 
    85 for $r^*$. For sequence, there is exactly one value, composed 
    86 of two component values ($v_1$ and $v_2$).
    86 of two component values ($v_1$ and $v_2$).
    87 
    87 
    88 My implementation of regular expressions and values in Scala is
    88 My implementation of regular expressions and values in Scala is
    89 shown below. I have in my implementation the convention that
    89 shown below. I have in my implementation the convention that
    90 regular expressions are written entirely with upper-case
    90 regular expressions are written entirely with upper-case
   379 \noindent The reason is that we look for the value that tells
   379 \noindent The reason is that we look for the value that tells
   380 us how $r_1 + r_2$ could have matched the string, not just
   380 us how $r_1 + r_2$ could have matched the string, not just
   381 $r_2$ or $r_{2s}$. Unfortunately, this is still not the right
   381 $r_2$ or $r_{2s}$. Unfortunately, this is still not the right
   382 value in general because there might be some simplifications
   382 value in general because there might be some simplifications
   383 that happened inside $r_2$ and for which the simplification
   383 that happened inside $r_2$ and for which the simplification
   384 function retuned also a rectification function $f_{2s}$. So in
   384 function returned also a rectification function $f_{2s}$. So in
   385 fact we need to apply this one too which gives
   385 fact we need to apply this one too which gives
   386 
   386 
   387 \begin{center}
   387 \begin{center}
   388 $Right(f_{2s}(v_{2s}))$
   388 $Right(f_{2s}(v_{2s}))$
   389 \end{center}
   389 \end{center}
   585 \small  
   585 \small  
   586 \lstinputlisting[numbers=left,linebackgroundcolor=
   586 \lstinputlisting[numbers=left,linebackgroundcolor=
   587    {\ifodd\value{lstnumber}\color{capri!3}\fi}]{../progs/app61.scala}
   587    {\ifodd\value{lstnumber}\color{capri!3}\fi}]{../progs/app61.scala}
   588 
   588 
   589 \caption{The Scala code for the simplification function. The
   589 \caption{The Scala code for the simplification function. The
   590 first part defines some auxillary functions for the rectification.
   590 first part defines some auxiliary functions for the rectification.
   591 The second part give the simplification function.
   591 The second part give the simplification function.
   592 \label{simprect}}
   592 \label{simprect}}
   593 \end{figure}
   593 \end{figure}
   594 
   594 
   595 
   595 
   605 \end{tabular}
   605 \end{tabular}
   606 \end{center}
   606 \end{center}
   607 
   607 
   608 \noindent This corresponds to the $matches$ function we have
   608 \noindent This corresponds to the $matches$ function we have
   609 seen in earlier lectures. In the first clause we are given an
   609 seen in earlier lectures. In the first clause we are given an
   610 empty string, $[]$, and need to test wether the regular
   610 empty string, $[]$, and need to test whether the regular
   611 expression is $nullable$. If yes, we can proceed normally and
   611 expression is $nullable$. If yes, we can proceed normally and
   612 just return the value calculated by $\textit{mkeps}$. The second clause
   612 just return the value calculated by $\textit{mkeps}$. The second clause
   613 is for strings where the first character is $c$, say, and the
   613 is for strings where the first character is $c$, say, and the
   614 rest of the string is $s$. We first build the derivative of
   614 rest of the string is $s$. We first build the derivative of
   615 $r$ with respect to $c$; simplify the resulting regular
   615 $r$ with respect to $c$; simplify the resulting regular