\documentclass{article}\usepackage{../style}\usepackage{../langs}\begin{document}\section*{Coursework 8 (Scala, Regular Expressions)}This coursework is worth 10\%. It is about regular expressions,pattern matching and polymorphism. The first part is due on 30November at 11pm; the second, more advanced part, is due on 7 Decemberat 11pm. You are asked to implement a regular expression matcher. Makesure the files you submit can be processed by just calling\texttt{scala <<filename.scala>>}.\bigskip\noindent\textbf{Important:} Do not use any mutable data structures in yoursubmission! They are not needed. This menas you cannot use \texttt{ListBuffer}s, for example. Do not use \texttt{return} in yourcode! It has a different meaning in Scala, than in Java. Do not use\texttt{var}! This declares a mutable variable. Make sure thefunctions you submit are defined on the ``top-level'' of Scala, notinside a class or object. Also note that the running time ofeach part will be restricted to a maximum of 360 seconds on mylaptop.\subsection*{Disclaimer}It should be understood that the work you submit representsyour own effort! You have not copied from anyone else. Anexception is the Scala code I showed during the lectures oruploaded to KEATS, which you can freely use.\bigskip\subsection*{Part 1 (6 Marks)}The task is to implement a regular expression matcher that is based onderivatives of regular expressions. The implementation can dealwith the following regular expressions, which have been predefinedin the file re.scala:\begin{center}\begin{tabular}{lcll} $r$ & $::=$ & $\ZERO$ & cannot match anything\\ & $|$ & $\ONE$ & can only match the empty string\\ & $|$ & $c$ & can match a character (in this case $c$)\\ & $|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\ & $|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\ & & & then the second part with $r_2$\\ & $|$ & $r^*$ & can match zero or more times $r$\\\end{tabular}\end{center}\noindent Why? Knowing how to match regular expressions and strings willlet you solve a lot of problems that vex other humans. Regularexpressions are one of the fastest and simplest ways to match patternsin text, and are endlessly useful for searching, editing andanalysing text in all sorts of places. However, you need to befast, otherwise you will stumble over problems such as recently reported at{\small\begin{itemize}\item[$\bullet$] \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}\item[$\bullet$] \url{https://vimeo.com/112065252}\item[$\bullet$] \url{http://davidvgalbraith.com/how-i-fixed-atom/} \end{itemize}}\subsubsection*{Tasks (file re.scala)}\begin{itemize}\item[(1a)] Implement a function, called \textit{nullable}, by recursion over regular expressions. This function tests whether a regular expression can match the empty string.\begin{center}\begin{tabular}{lcl}$\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\$\textit{nullable}(\ONE)$ & $\dn$ & $\textit{true}$\\$\textit{nullable}(c)$ & $\dn$ & $\textit{false}$\\$\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\$\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\$\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\\end{tabular}\end{center}\hfill[1 Mark]\item[(1b)] Implement a function, called \textit{der}, by recursion over regular expressions. It takes a character and a regular expression as arguments and calculates the derivative regular expression according to the rules:\begin{center}\begin{tabular}{lcl}$\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\$\textit{der}\;c\;(\ONE)$ & $\dn$ & $\ZERO$\\$\textit{der}\;c\;(d)$ & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\$\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\$\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\ & & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\ & & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\$\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\\end{tabular}\end{center}For example given the regular expression $r = (a \cdot b) \cdot c$, the derivativesw.r.t.~the characters $a$, $b$ and $c$ are\begin{center} \begin{tabular}{lcll} $\textit{der}\;a\;r$ & $=$ & $(\ONE \cdot b)\cdot c$ & ($= r'$)\\ $\textit{der}\;b\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$\\ $\textit{der}\;c\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$ \end{tabular}\end{center}Let $r'$ stand for the first derivative, then taking the derivatives of $r'$w.r.t.~the characters $a$, $b$ and $c$ gives\begin{center} \begin{tabular}{lcll} $\textit{der}\;a\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$ \\ $\textit{der}\;b\;r'$ & $=$ & $((\ZERO \cdot b) + \ONE)\cdot c$ & ($= r''$)\\ $\textit{der}\;c\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$ \end{tabular}\end{center}One more example: Let $r''$ stand for the second derivative above,then taking the derivatives of $r''$ w.r.t.~the characters $a$, $b$and $c$ gives\begin{center} \begin{tabular}{lcll} $\textit{der}\;a\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$ \\ $\textit{der}\;b\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$\\ $\textit{der}\;c\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ONE$ \end{tabular}\end{center}Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\\mbox{}\hfill\mbox{[1 Mark]}\item[(1c)] Implement the function \textit{simp}, which recursively traverses a regular expression from the inside to the outside, and simplifies every sub-regular-expression on the left (see below) to the regular expression on the right, except it does not simplify inside ${}^*$-regular expressions. \begin{center}\begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll}$r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ $r \cdot \ONE$ & $\mapsto$ & $r$\\ $\ONE \cdot r$ & $\mapsto$ & $r$\\ $r + \ZERO$ & $\mapsto$ & $r$\\ $\ZERO + r$ & $\mapsto$ & $r$\\ $r + r$ & $\mapsto$ & $r$\\ \end{tabular} \end{center} For example the regular expression \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\] simplifies to just $r_1$. \textbf{Hints:} Regular expressions can be seen as trees and there are several methods for traversing trees. One of them corresponds to the inside-out traversal. Also remember numerical expressions from school: there you had exprssions like $u + \ldots + (1 \cdot x) * \ldots (z + (y \cdot 0)) \ldots$ and simplification rules that looked very similar to rules above. You would simplify such numerical expressions by replacing for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then look if more rules are applicable. If you organise this simplification in an inside-out fashion, it is always clear which rule should applied next.\\\mbox{}\hfill[1 Mark]\item[(1d)] Implement two functions: The first, called \textit{ders}, takes a list of characters and a regular expression as arguments, and builds the derivative w.r.t.~the list as follows:\begin{center}\begin{tabular}{lcl}$\textit{ders}\;(Nil)\;r$ & $\dn$ & $r$\\ $\textit{ders}\;(c::cs)\;r$ & $\dn$ & $\textit{ders}\;cs\;(\textit{simp}(\textit{der}\;c\;r))$\\\end{tabular}\end{center}Note that this function is different from \textit{der}, which onlytakes a single character.The second function, called \textit{matcher}, takes a string and aregular expression as arguments. It builds first the derivativesaccording to \textit{ders} and after that tests whether the resultingderivative regular expression can match the empty string (using\textit{nullable}). For example the \textit{matcher} will producetrue given the regular expression $(a\cdot b)\cdot c$ and the string$abc$.\\ \mbox{}\hfill[1 Mark]\item[(1e)] Implement the function $\textit{replace}\;r\;s_1\;s_2$: it searches (from the left to right) in the string $s_1$ all the non-empty substrings that match the regular expression $r$---these substrings are assumed to bethe longest substrings matched by the regular expression andassumed to be non-overlapping. All these substrings in $s_1$ matched by $r$are replaced by $s_2$. For example given the regular expression\[(a \cdot a)^* + (b \cdot b)\]\noindent the string $s_1 = aabbbaaaaaaabaaaaabbaaaabb$ andreplacement the string $s_2 = c$ yields the string\[ccbcabcaccc\]\hfill[2 Marks]\end{itemize}\subsection*{Part 2 (4 Marks)}You need to copy all the code from \texttt{re.scala} into\texttt{re2.scala} in order to complete Part 2. Parts (2a) and (2b)give you another method and datapoints for testing the \textit{der}and \textit{simp} functions from Part~1.\subsubsection*{Tasks (file re2.scala)}\begin{itemize}\item[(2a)] Write a \textbf{polymorphic} function, called \textit{iterT}, that is \textbf{tail-recursive}(!) and takes an integer $n$, a function $f$ and an $x$ as arguments. This function should iterate $f$ $n$-times starting with the argument $x$, like \[\underbrace{f(\ldots (f}_{n\text{-times}}(x))) \] More formally that means \textit{iterT} behaves as follows: \begin{center} \begin{tabular}{lcl} $\textit{iterT}(n, f, x)$ & $\dn$ & $\begin{cases} \;x & \textit{if}\;n = 0\\ \;f(\textit{iterT}(n - 1, f, x)) & \textit{otherwise} \end{cases}$ \end{tabular}\end{center} Make sure you write a \textbf{tail-recursive} version of \textit{iterT}. If you add the annotation \texttt{@tailrec} (see below) your code should not produce an error message. \begin{lstlisting}[language=Scala, numbers=none, xleftmargin=-1mm] import scala.annotation.tailrec @tailrec def iterT[A](n: Int, f: A => A, x: A): A = ... \end{lstlisting} You can assume that \textit{iterT} will only be called for positive integers $0 \le n$. Given the type variable \texttt{A}, the type of $f$ is \texttt{A => A} and the type of $x$ is \texttt{A}. This means \textit{iterT} can be used, for example, for functions from integers to integers, or strings to strings, or regular expressions to regular expressions. \\ \mbox{}\hfill[2 Marks]\item[(2b)] Implement a function, called \textit{size}, by recursion over regular expressions. If a regular expression is seen as a tree, then \textit{size} should return the number of nodes in such a tree. Therefore this function is defined as follows:\begin{center}\begin{tabular}{lcl}$\textit{size}(\ZERO)$ & $\dn$ & $1$\\$\textit{size}(\ONE)$ & $\dn$ & $1$\\$\textit{size}(c)$ & $\dn$ & $1$\\$\textit{size}(r_1 + r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\$\textit{size}(r_1 \cdot r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\$\textit{size}(r^*)$ & $\dn$ & $1 + \textit{size}(r)$\\\end{tabular}\end{center}You can use \textit{size} and \textit{iterT} in order to test how muchthe 'evil' regular expression $(a^*)^* \cdot b$ grows when takingsuccessive derivatives according the letter $a$ and then compare it totaking the derivative, but simlifying the derivative after each step.For example, the calls \begin{lstlisting}[language=Scala, numbers=none, xleftmargin=-1mm] size(iterT(20, (r: Rexp) => der('a', r), EVIL)) size(iterT(20, (r: Rexp) => simp(der('a', r)), EVIL)) \end{lstlisting} produce without simplification a regular expression of size of 7340068 after 20 iterations, while the one with simplification gives just 8.\\ \mbox{}\hfill[1 Mark]\item[(2c)] Write a \textbf{polymorphic} function, called \textit{fixpT}, that takes a function $f$ and an $x$ as arguments. The purpose of \textit{fixpT} is to calculate a fixpoint of the function $f$ starting from the argument $x$. A fixpoint, say $y$, is when $f(y) = y$ holds. That means \textit{fixpT} behaves as follows: \begin{center} \begin{tabular}{lcl} $\textit{fixpT}(f, x)$ & $\dn$ & $\begin{cases} \;x & \textit{if}\;f(x) = x\\ \;\textit{fixpT}(f, f(x)) & \textit{otherwise} \end{cases}$ \end{tabular}\end{center} Make sure you calculate in the code of $\textit{fixpT}$ the result of $f(x)$ only once. Given the type variable \texttt{A} in $\textit{fixpT}$, the type of $f$ is \texttt{A => A} and the type of $x$ is \texttt{A}. The file \texttt{re2.scala} gives two example function where in one the fixpoint is 1 and in the other it is the string $a$.\\ \mbox{}\hfill[1 Mark] \end{itemize}\bigskip \subsection*{Background}Although easily implementable in Scala, the idea behind the derivativefunction might not so easy to be seen. To understand its purposebetter, assume a regular expression $r$ can match strings of the form$c::cs$ (that means strings which start with a character $c$ and havesome rest, or tail, $cs$). If you now take the derivative of $r$ withrespect to the character $c$, then you obtain a regular expressionsthat can match all the strings $cs$. In other words, the regularexpression $\textit{der}\;c\;r$ can match the same strings $c::cs$that can be matched by $r$, except that the $c$ is chopped off.Assume now $r$ can match the string $abc$. If you take the derivativeaccording to $a$ then you obtain a regular expression that can match$bc$ (it is $abc$ where the $a$ has been chopped off). If you nowbuild the derivative $\textit{der}\;b\;(\textit{der}\;a\;r))$ youobtain a regular expression that can match the string $c$ (it is $bc$where $b$ is chopped off). If you finally build the derivative of thisaccording $c$, that is$\textit{der}\;c\;(\textit{der}\;b\;(\textit{der}\;a\;r)))$, youobtain a regular expression that can match the empty string. You cantest this using the function nullable, which is what your matcher isdoing.The purpose of the simp function is to keep the regular expressionsmall. Normally the derivative function makes the regular expressionbigger (see the SEQ case and the example in (2b)) and the algorithmwould be slower and slower over time. The simp function counters thisincrease in size and the result is that the algorithm is fastthroughout. By the way, this algorithm is by Janusz Brzozowski whocame up with the idea of derivatives in 1964 in his PhD thesis.\begin{center}\small\url{https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)}\end{center}\end{document}%%% Local Variables: %%% mode: latex%%% TeX-master: t%%% End: