equal
deleted
inserted
replaced
21 the regular expressions $a^?{}^{\{n\}}\cdot a^{\{n\}}$ and strings |
21 the regular expressions $a^?{}^{\{n\}}\cdot a^{\{n\}}$ and strings |
22 also composed of $n$ \pcode{a}s (this time the regular expressions |
22 also composed of $n$ \pcode{a}s (this time the regular expressions |
23 match the strings). To see the substantial differences in the left |
23 match the strings). To see the substantial differences in the left |
24 and right plots below, note the different scales of the $x$-axes. |
24 and right plots below, note the different scales of the $x$-axes. |
25 |
25 |
26 |
26 |
27 \begin{center} |
27 \begin{center} |
28 Graphs: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$ |
28 Graphs: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$ |
29 \begin{tabular}{@{}cc@{}} |
29 \begin{tabular}{@{}cc@{}} |
30 \begin{tikzpicture} |
30 \begin{tikzpicture} |
31 \begin{axis}[ |
31 \begin{axis}[ |
280 \noindent |
280 \noindent |
281 We will not use them in our algorithm, but feel free to convince you |
281 We will not use them in our algorithm, but feel free to convince you |
282 that they hold. As an aside, there has been a lot of research about |
282 that they hold. As an aside, there has been a lot of research about |
283 questions like: Can one always decide when two regular expressions are |
283 questions like: Can one always decide when two regular expressions are |
284 equivalent or not? What does an algorithm look like to decide this |
284 equivalent or not? What does an algorithm look like to decide this |
285 efficiently? |
285 efficiently? So in general it is not a trivial problem. |
286 |
286 |
287 \subsection*{The Matching Algorithm} |
287 \subsection*{The Matching Algorithm} |
288 |
288 |
289 The algorithm we will define below consists of two parts. One |
289 The algorithm we will define below consists of two parts. One |
290 is the function $\textit{nullable}$ which takes a regular expression as |
290 is the function $\textit{nullable}$ which takes a regular expression as |
724 \end{axis} |
724 \end{axis} |
725 \end{tikzpicture} |
725 \end{tikzpicture} |
726 \end{center} |
726 \end{center} |
727 |
727 |
728 \noindent |
728 \noindent |
729 To reacap, Python and Ruby needed approximately 30 seconds to match a |
729 To recap, Python and Ruby needed approximately 30 seconds to match a |
730 string of 28 \texttt{a}s and the regular expression $a^{?\{n\}} \cdot |
730 string of 28 \texttt{a}s and the regular expression $a^{?\{n\}} \cdot |
731 a^{\{n\}}$. We need a third of this time to do the same with strings |
731 a^{\{n\}}$. We need a third of this time to do the same with strings |
732 up to 11,000 \texttt{a}s. Similarly, Java and Python needed 30 |
732 up to 11,000 \texttt{a}s. Similarly, Java and Python needed 30 |
733 seconds to find out the regular expression $(a^*)^* \cdot b$ does not |
733 seconds to find out the regular expression $(a^*)^* \cdot b$ does not |
734 match the string of 28 \texttt{a}s. We can do the same in |
734 match the string of 28 \texttt{a}s. We can do the same in |
772 \noindent |
772 \noindent |
773 I have not fully understood why this version is much faster, |
773 I have not fully understood why this version is much faster, |
774 but it seems it is a combination of the clauses for \texttt{ALT} |
774 but it seems it is a combination of the clauses for \texttt{ALT} |
775 and \texttt{SEQ}. In the latter case we call \texttt{der} with |
775 and \texttt{SEQ}. In the latter case we call \texttt{der} with |
776 a single character and this potentially produces an alternative. |
776 a single character and this potentially produces an alternative. |
777 The derivative of such an alternative can then be more effeciently |
777 The derivative of such an alternative can then be more efficiently |
778 calculated by \texttt{ders2} since it pushes a whole string |
778 calculated by \texttt{ders2} since it pushes a whole string |
779 under an \texttt{ALT}. The numbers are that in the second case |
779 under an \texttt{ALT}. The numbers are that in the second case |
780 $(a^*)^* \cdot b$ both versions are pretty much the same, but in the |
780 $(a^*)^* \cdot b$ both versions are pretty much the same, but in the |
781 first case $a^{?\{n\}} \cdot a^{\{n\}}$ the improvement gives |
781 first case $a^{?\{n\}} \cdot a^{\{n\}}$ the improvement gives |
782 another factor of 100 speedup. Nice! |
782 another factor of 100 speedup. Nice! |
1044 meets its specification. To have done so, requires a few induction |
1044 meets its specification. To have done so, requires a few induction |
1045 proofs about strings and regular expressions. Following the \emph{induction |
1045 proofs about strings and regular expressions. Following the \emph{induction |
1046 recipes} is already a big step in actually performing these proofs. |
1046 recipes} is already a big step in actually performing these proofs. |
1047 If you do not believe it, proofs have helped me to make sure my code |
1047 If you do not believe it, proofs have helped me to make sure my code |
1048 is correct and in several instances prevented me of letting slip |
1048 is correct and in several instances prevented me of letting slip |
1049 embarassing mistakes into the `wild'. |
1049 embarrassing mistakes into the `wild'. |
1050 |
1050 |
1051 \end{document} |
1051 \end{document} |
1052 |
1052 |
1053 |
1053 |
1054 |
1054 |