276 $(r_1 \cdot r_2)^*$ & $\equiv$ & $1 + r_1\cdot (r_2 \cdot r_1)^* \cdot r_2$\\ |
276 $(r_1 \cdot r_2)^*$ & $\equiv$ & $1 + r_1\cdot (r_2 \cdot r_1)^* \cdot r_2$\\ |
277 \end{tabular} |
277 \end{tabular} |
278 \end{center} |
278 \end{center} |
279 |
279 |
280 \noindent |
280 \noindent |
281 We will not use them in our algorithm, but feel free to convince you |
281 We will not use them in our algorithm, but feel free to convince yourself |
282 that they hold. As an aside, there has been a lot of research about |
282 that they hold. As an aside, there has been a lot of research about |
283 questions like: Can one always decide when two regular expressions are |
283 questions like: Can one always decide when two regular expressions are |
284 equivalent or not? What does an algorithm look like to decide this |
284 equivalent or not? What does an algorithm look like to decide this |
285 efficiently? So in general it is not a trivial problem. |
285 efficiently? So in general it is not a trivial problem. |
286 |
286 |
476 \noindent holds, which means our algorithm satisfies the |
476 \noindent holds, which means our algorithm satisfies the |
477 specification. Of course we can claim many things\ldots |
477 specification. Of course we can claim many things\ldots |
478 whether the claim holds any water is a different question, |
478 whether the claim holds any water is a different question, |
479 which for example is the point of the Strand-2 Coursework. |
479 which for example is the point of the Strand-2 Coursework. |
480 |
480 |
481 This algorithm was introduced by Janus Brzozowski in 1964, but |
481 This algorithm was introduced by Janusz Brzozowski in 1964, but |
482 is more widely known only in the last 10 or so years. Its |
482 is more widely known only in the last 10 or so years. Its |
483 main attractions are simplicity and being fast, as well as |
483 main attractions are simplicity and being fast, as well as |
484 being easily extendable for other regular expressions such as |
484 being easily extendible for other regular expressions such as |
485 $r^{\{n\}}$, $r^?$, $\sim{}r$ and so on (this is subject of |
485 $r^{\{n\}}$, $r^?$, $\sim{}r$ and so on (this is subject of |
486 Strand-1 Coursework 1). |
486 Strand-1 Coursework 1). |
487 |
487 |
488 \subsection*{The Matching Algorithm in Scala} |
488 \subsection*{The Matching Algorithm in Scala} |
489 |
489 |
541 %This is not an error: it hardly takes more than half a second for |
541 %This is not an error: it hardly takes more than half a second for |
542 %strings up to the length of 6500. After that we receive a |
542 %strings up to the length of 6500. After that we receive a |
543 %StackOverflow exception, but still\ldots |
543 %StackOverflow exception, but still\ldots |
544 |
544 |
545 For running the algorithm with our first example, the evil |
545 For running the algorithm with our first example, the evil |
546 regular expression $a^?{}^{\{n\}}a^{\{n\}}$, we need to implement |
546 regular expression $a^?{}^{\{n\}}\cdot a^{\{n\}}$, we need to implement |
547 the optional regular expression and the `exactly $n$-times |
547 the optional regular expression and the `exactly $n$-times |
548 regular expression'. This can be done with the translations |
548 regular expression'. This can be done with the translations |
549 |
549 |
550 \lstinputlisting[numbers=none]{../progs/app51.scala} |
550 \lstinputlisting[numbers=none]{../progs/app51.scala} |
551 |
551 |
695 \caption{The simplification function and modified |
695 \caption{The simplification function and modified |
696 \texttt{ders}-function; this function now |
696 \texttt{ders}-function; this function now |
697 calls \texttt{der} first, but then simplifies |
697 calls \texttt{der} first, but then simplifies |
698 the resulting derivative regular expressions before |
698 the resulting derivative regular expressions before |
699 building the next derivative, see |
699 building the next derivative, see |
700 Line~\ref{simpline}.\label{scala2}} |
700 Line~24.\label{scala2}} |
701 \end{figure} |
701 \end{figure} |
702 |
702 |
703 \begin{center} |
703 \begin{center} |
704 \begin{tikzpicture} |
704 \begin{tikzpicture} |
705 \begin{axis}[ |
705 \begin{axis}[ |
727 |
727 |
728 \noindent |
728 \noindent |
729 To recap, Python and Ruby needed approximately 30 seconds to match a |
729 To recap, Python and Ruby needed approximately 30 seconds to match a |
730 string of 28 \texttt{a}s and the regular expression $a^{?\{n\}} \cdot |
730 string of 28 \texttt{a}s and the regular expression $a^{?\{n\}} \cdot |
731 a^{\{n\}}$. We need a third of this time to do the same with strings |
731 a^{\{n\}}$. We need a third of this time to do the same with strings |
732 up to 11,000 \texttt{a}s. Similarly, Java and Python needed 30 |
732 up to 11,000 \texttt{a}s. Similarly, Java 8 and Python needed 30 |
733 seconds to find out the regular expression $(a^*)^* \cdot b$ does not |
733 seconds to find out the regular expression $(a^*)^* \cdot b$ does not |
734 match the string of 28 \texttt{a}s. We can do the same in the same amount of time |
734 match the string of 28 \texttt{a}s. In Java 9 and later this has been |
735 for strings composed of nearly 6,000,000 \texttt{a}s: |
735 cranked up to 39,000 \texttt{a}s, but we can do the same in the same |
|
736 amount of time for strings composed of nearly 6,000,000 \texttt{a}s: |
736 |
737 |
737 |
738 |
738 \begin{center} |
739 \begin{center} |
739 \begin{tikzpicture} |
740 \begin{tikzpicture} |
740 \begin{axis}[ |
741 \begin{axis}[ |
1044 meets its specification. To have done so, requires a few induction |
1045 meets its specification. To have done so, requires a few induction |
1045 proofs about strings and regular expressions. Following the \emph{induction |
1046 proofs about strings and regular expressions. Following the \emph{induction |
1046 recipes} is already a big step in actually performing these proofs. |
1047 recipes} is already a big step in actually performing these proofs. |
1047 If you do not believe it, proofs have helped me to make sure my code |
1048 If you do not believe it, proofs have helped me to make sure my code |
1048 is correct and in several instances prevented me of letting slip |
1049 is correct and in several instances prevented me of letting slip |
1049 embarrassing mistakes into the `wild'. |
1050 embarrassing mistakes into the `wild'. |
1050 |
1051 |
1051 \end{document} |
1052 \end{document} |
1052 |
1053 |
1053 |
1054 |
1054 |
1055 |
1055 |
1056 % !TeX program = latexmk -xelatex |
1056 %%% Local Variables: |
1057 %%% Local Variables: |
1057 %%% mode: latex |
1058 %%% mode: latex |
1058 %%% TeX-master: t |
1059 %%% TeX-master: t |
1059 %%% End: |
1060 %%% End: |