3 \usepackage{../langs} |
3 \usepackage{../langs} |
4 \usepackage{../graphics} |
4 \usepackage{../graphics} |
5 |
5 |
6 |
6 |
7 \begin{document} |
7 \begin{document} |
8 \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016} |
8 \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017} |
9 |
9 |
10 \section*{Handout 4 (Sulzmann \& Lu Algorithm)} |
10 \section*{Handout 4 (Sulzmann \& Lu Algorithm)} |
11 |
11 |
12 So far our algorithm based on derivatives was only able to say |
12 So far our algorithm based on derivatives was only able to say |
13 yes or no depending on whether a string was matched by a regular |
13 yes or no depending on whether a string was matched by a regular |
26 examples and definitions.} My PhD student Fahad Ausaf and I even more recently |
26 examples and definitions.} My PhD student Fahad Ausaf and I even more recently |
27 wrote a paper where we build on their result: we provided a |
27 wrote a paper where we build on their result: we provided a |
28 mathematical proof that their algorithm is really correct---the proof |
28 mathematical proof that their algorithm is really correct---the proof |
29 Sulzmann \& Lu had originally given contained major flaws. Such correctness |
29 Sulzmann \& Lu had originally given contained major flaws. Such correctness |
30 proofs are important: Kuklewicz maintains a unit-test library |
30 proofs are important: Kuklewicz maintains a unit-test library |
31 for the kind of algorithma we are interested in here and he showed |
31 for the kind of algorithms we are interested in here and he showed |
32 that many implementations in the ``wild'' are buggy, that is not |
32 that many implementations in the ``wild'' are buggy, that is not |
33 satisfy his unit tests: |
33 satisfy his unit tests: |
34 |
34 |
35 \begin{center} |
35 \begin{center} |
36 \url{http://www.haskell.org/haskellwiki/Regex_Posix} |
36 \url{http://www.haskell.org/haskellwiki/Regex_Posix} |
79 each regular expression with the exception of $r_1 + r_2$ |
79 each regular expression with the exception of $r_1 + r_2$ |
80 where there are two values, namely $\Left(v)$ and $Right(v)$ |
80 where there are two values, namely $\Left(v)$ and $Right(v)$ |
81 corresponding to the two alternatives. Note that $r^*$ is |
81 corresponding to the two alternatives. Note that $r^*$ is |
82 associated with a list of values, one for each copy of $r$ |
82 associated with a list of values, one for each copy of $r$ |
83 that was needed to match the string. This means we might also |
83 that was needed to match the string. This means we might also |
84 return the empty list $Stars []$, if no copy was needed in case |
84 return the empty list $Stars []$, if no copy was needed |
85 of $r^*$. For sequence, there is exactly one value, composed |
85 for $r^*$. For sequence, there is exactly one value, composed |
86 of two component values ($v_1$ and $v_2$). |
86 of two component values ($v_1$ and $v_2$). |
87 |
87 |
88 My implementation of regular expressions and values in Scala is |
88 My implementation of regular expressions and values in Scala is |
89 shown below. I have in my implementation the convention that |
89 shown below. I have in my implementation the convention that |
90 regular expressions are written entirely with upper-case |
90 regular expressions are written entirely with upper-case |
379 \noindent The reason is that we look for the value that tells |
379 \noindent The reason is that we look for the value that tells |
380 us how $r_1 + r_2$ could have matched the string, not just |
380 us how $r_1 + r_2$ could have matched the string, not just |
381 $r_2$ or $r_{2s}$. Unfortunately, this is still not the right |
381 $r_2$ or $r_{2s}$. Unfortunately, this is still not the right |
382 value in general because there might be some simplifications |
382 value in general because there might be some simplifications |
383 that happened inside $r_2$ and for which the simplification |
383 that happened inside $r_2$ and for which the simplification |
384 function retuned also a rectification function $f_{2s}$. So in |
384 function returned also a rectification function $f_{2s}$. So in |
385 fact we need to apply this one too which gives |
385 fact we need to apply this one too which gives |
386 |
386 |
387 \begin{center} |
387 \begin{center} |
388 $Right(f_{2s}(v_{2s}))$ |
388 $Right(f_{2s}(v_{2s}))$ |
389 \end{center} |
389 \end{center} |
585 \small |
585 \small |
586 \lstinputlisting[numbers=left,linebackgroundcolor= |
586 \lstinputlisting[numbers=left,linebackgroundcolor= |
587 {\ifodd\value{lstnumber}\color{capri!3}\fi}]{../progs/app61.scala} |
587 {\ifodd\value{lstnumber}\color{capri!3}\fi}]{../progs/app61.scala} |
588 |
588 |
589 \caption{The Scala code for the simplification function. The |
589 \caption{The Scala code for the simplification function. The |
590 first part defines some auxillary functions for the rectification. |
590 first part defines some auxiliary functions for the rectification. |
591 The second part give the simplification function. |
591 The second part give the simplification function. |
592 \label{simprect}} |
592 \label{simprect}} |
593 \end{figure} |
593 \end{figure} |
594 |
594 |
595 |
595 |
605 \end{tabular} |
605 \end{tabular} |
606 \end{center} |
606 \end{center} |
607 |
607 |
608 \noindent This corresponds to the $matches$ function we have |
608 \noindent This corresponds to the $matches$ function we have |
609 seen in earlier lectures. In the first clause we are given an |
609 seen in earlier lectures. In the first clause we are given an |
610 empty string, $[]$, and need to test wether the regular |
610 empty string, $[]$, and need to test whether the regular |
611 expression is $nullable$. If yes, we can proceed normally and |
611 expression is $nullable$. If yes, we can proceed normally and |
612 just return the value calculated by $\textit{mkeps}$. The second clause |
612 just return the value calculated by $\textit{mkeps}$. The second clause |
613 is for strings where the first character is $c$, say, and the |
613 is for strings where the first character is $c$, say, and the |
614 rest of the string is $s$. We first build the derivative of |
614 rest of the string is $s$. We first build the derivative of |
615 $r$ with respect to $c$; simplify the resulting regular |
615 $r$ with respect to $c$; simplify the resulting regular |