6 \usepackage{../data} |
6 \usepackage{../data} |
7 |
7 |
8 |
8 |
9 \begin{document} |
9 \begin{document} |
10 \fnote{\copyright{} Christian Urban, King's College London, |
10 \fnote{\copyright{} Christian Urban, King's College London, |
11 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023} |
11 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024} |
12 |
12 |
13 |
13 |
14 \section*{Handout 2 (Regular Expression Matching)} |
14 \section*{Handout 2 (Regular Expression Matching)} |
15 |
15 |
16 %\noindent |
16 %\noindent |
387 matches a string of the form $c\!::\!s$, then the first part |
387 matches a string of the form $c\!::\!s$, then the first part |
388 must be matched by $r_1$. Consequently, it makes sense to |
388 must be matched by $r_1$. Consequently, it makes sense to |
389 construct the regular expression for $s$ by calling $\textit{der}$ with |
389 construct the regular expression for $s$ by calling $\textit{der}$ with |
390 $r_1$ and ``appending'' $r_2$. There is however one exception |
390 $r_1$ and ``appending'' $r_2$. There is however one exception |
391 to this simple rule: if $r_1$ can match the empty string, then |
391 to this simple rule: if $r_1$ can match the empty string, then |
392 all of $c\!::\!s$ is matched by $r_2$. So in case $r_1$ is |
392 all of $c\!::\!s$ is matched by $r_2$. Therefore in case $r_1$ is |
393 nullable (that is can match the empty string) we have to allow |
393 nullable (that is can match the empty string) we have to allow |
394 the choice $\textit{der}\,c\,r_2$ for calculating the regular |
394 the choice $\textit{der}\,c\,r_2$ for calculating the regular |
395 expression that can match $s$. Therefore we have to add the |
395 expression that can match $s$. This means we have to add the |
396 regular expression $\textit{der}\,c\,r_2$ in the result. The $*$-case |
396 regular expression $\textit{der}\,c\,r_2$ in the result. The $*$-case |
397 is again simple: if $r^*$ matches a string of the form |
397 is again simple: if $r^*$ matches a string of the form |
398 $c\!::\!s$, then the first part must be ``matched'' by a |
398 $c\!::\!s$, then the first part must be ``matched'' by a |
399 single copy of $r$. Therefore we call recursively $\textit{der}\,c\,r$ |
399 single copy of $r$. Therefore we call recursively $\textit{der}\,c\,r$ |
400 and ``append'' $r^*$ in order to match the rest of $s$. Still |
400 and ``append'' $r^*$ in order to match the rest of $s$. Still |
786 \subsection*{Epilogue} |
786 \subsection*{Epilogue} |
787 |
787 |
788 (23/Aug/2016) I found another place where this algorithm can |
788 (23/Aug/2016) I found another place where this algorithm can |
789 be sped up (this idea is not integrated with what is coming next, but |
789 be sped up (this idea is not integrated with what is coming next, but |
790 I present it nonetheless). The idea is to not define \texttt{ders} |
790 I present it nonetheless). The idea is to not define \texttt{ders} |
791 that it iterates the derivative character-by-character, but in bigger |
791 so that it iterates the derivative character-by-character, but in bigger |
792 chunks. The resulting code for \texttt{ders2} looks as follows: |
792 chunks. The resulting code for \texttt{ders2} looks as follows: |
793 |
793 |
794 \lstinputlisting[numbers=none]{../progs/app52.scala} |
794 \lstinputlisting[numbers=none]{../progs/app52.scala} |
795 |
795 |
796 \noindent |
796 \noindent |