diff -r ca5884c2e3bd -r 8da9e0c16194 cws/cw03.tex --- a/cws/cw03.tex Thu Nov 24 01:44:38 2016 +0000 +++ b/cws/cw03.tex Thu Nov 24 09:42:49 2016 +0000 @@ -6,55 +6,72 @@ \section*{Coursework 8 (Scala, Regular Expressions} -This coursework is worth 10\% and is due on XXXX at -16:00. You are asked to implement a regular expression matcher. - -Make sure the files +This coursework is worth 10\%. It is about regular expressions and +pattern matching. The first part is due on 30 November at 11pm; the +second, more advanced part, is due on 7 December at 11pm. The +second part is not yet included. For the first part you are +asked to implement a regular expression matcher. Make sure the files you submit can be processed by just calling \texttt{scala - <>}.\bigskip + <>}.\bigskip \noindent \textbf{Important:} Do not use any mutable data structures in your submissions! They are not needed. This excluded the use of \texttt{ListBuffer}s, for example. Do not use \texttt{return} in your -code! It has a different meaning in Scala, than in Java. -Do not use \texttt{var}! This declares a mutable variable. Feel free to -copy any code you need from files \texttt{knight1.scala}, -\texttt{knight2.scala} and \texttt{knight3.scala}. Make sure the +code! It has a different meaning in Scala, than in Java. Do not use +\texttt{var}! This declares a mutable variable. Make sure the functions you submit are defined on the ``top-level'' of Scala, not inside a class or object. -\subsection*{Disclaimer} +\subsection*{Disclaimer!!!!!!!!} It should be understood that the work you submit represents -your own effort. You have not copied from anyone else. An +your own effort! You have not copied from anyone else. An exception is the Scala code I showed during the lectures or uploaded to KEATS, which you can freely use.\bigskip -\subsubsection*{Task} +\subsection*{Part 1 (6 Marks)} The task is to implement a regular expression matcher based on -derivatives of regular expressions. The implementation should -be able to deal with the usual (basic) regular expressions +derivatives of regular expressions. The implementation can deal +with the following regular expressions, which have been predefined +file re.scala: \begin{center} \begin{tabular}{lcll} $r$ & $::=$ & $\ZERO$ & cannot match anything\\ & $|$ & $\ONE$ & can only match the empty string\\ - & $|$ & $c$ & can match a character $c$\\ - & $|$ & $r_1 + r_2$ & can match either with $r_1$ or with $r_2$\\ - & $|$ & $r_1 \cdot r_2$ & can match first with $r_1$ and then with $r_2$\\ + & $|$ & $c$ & can match a character (in this case $c$)\\ + & $|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\ + & $|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\ + & & & then the second part with $r_2$\\ & $|$ & $r^*$ & can match zero or more times $r$\\ - & $|$ & $r^{\{\uparrow n\}}$ & can match zero upto $n$ times $r$\\ - & $|$ & $r^{\{n\}}$ & can match exactly $n$ times $r$\\ \end{tabular} \end{center} -\noindent -Implement a function called \textit{nullable} by recursion over -regular expressions: +\noindent +Why? Knowing how to match regular expressions and strings fast will +let you solve a lot of problems that vex other humans. Regular +expressions are one of the fastest and simplest ways to match patterns +in text, and are endlessly useful for searching, editing and +analysing text in all sorts of places. However, you need to be +fast, otherwise you will stumble upon problems such as recently reported at + +{\small +\begin{itemize} +\item[$\bullet$] \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016} +\item[$\bullet$] \url{https://vimeo.com/112065252} +\item[$\bullet$] \url{http://davidvgalbraith.com/how-i-fixed-atom/} +\end{itemize}} + +\subsection*{Tasks (file re.scala)} + +\begin{itemize} +\item[(1a)] Implement a function, called \textit{nullable}, by recursion over + regular expressions. This function test whether a regular expression can match + the empty string. \begin{center} \begin{tabular}{lcl} @@ -64,11 +81,12 @@ $\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\ $\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\ $\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\ -$\textit{nullable}(r^{\{\uparrow n\}})$ & $\dn$ & $\textit{true}$\\ -$\textit{nullable}(r^{\{n\}})$ & $\dn$ & - $\textit{if}\;n = 0\; \textit{then} \; \textit{true} \; \textit{else} \; \textit{nullable}(r)$\\ \end{tabular} -\end{center} +\end{center}\hfill[1 Mark] + +\item[(1b)] Implement a function, called \textit{der}, by recursion over + regular expressions. It takes a character and a regular expression + as arguments and calculates the derivative regular expression. \begin{center} \begin{tabular}{lcl} @@ -80,85 +98,16 @@ & & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\ & & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\ $\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\ -$\textit{der}\;c\;(r^{\{\uparrow n\}})$ & $\dn$ & $\textit{if}\;n = 0\;\textit{then}\;\ZERO\;\text{else}\; - (\textit{der}\;c\;r)\cdot (r^{\{\uparrow n-1\}})$\\ -$\textit{der}\;c\;(r^{\{n\}})$ & $\dn$ & - $\textit{if}\;n = 0\; \textit{then} \; \ZERO\; \textit{else}\;$\\ - & & $\textit{if} \;\textit{nullable}(r)\;\textit{then}\;(\textit{der}\;c\;r)\cdot (r^{\{\uparrow n-1\}})$\\ - & & $\textit{else}\;(\textit{der}\;c\;r)\cdot (r^{\{n-1\}})$ \end{tabular} -\end{center} - - -Be careful that your implementation of \textit{nullable} and -\textit{der}\;c\; satisfies for every $r$ the following two -properties (see also Question 2): - -\begin{itemize} -\item $\textit{nullable}(r)$ if and only if $[]\in L(r)$ -\item $L(der\,c\,r) = Der\,c\,(L(r))$ -\end{itemize} - -\noindent {\bf Important!} Your implementation should have -explicit cases for the basic regular expressions, but also -explicit cases for the extended regular expressions. That -means do not treat the extended regular expressions by just -translating them into the basic ones. See also Question 2, -where you are asked to explicitly give the rules for -\textit{nullable} and \textit{der}\;c\; for the extended regular -expressions. - - -\subsection*{Question 1} - -What is your King's email address (you will need it in -Question 3)? - -\subsection*{Question 2} +\end{center}\hfill[1 Mark] -This question does not require any implementation. From the -lectures you have seen the definitions for the functions -\textit{nullable} and \textit{der}\;c\; for the basic regular -expressions. Give the rules for the extended regular -expressions: - -\begin{center} -\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}} -$\textit{nullable}([c_1 c_2 \ldots c_n])$ & $\dn$ & $?$\\ -$\textit{nullable}(r^+)$ & $\dn$ & $?$\\ -$\textit{nullable}(r^?)$ & $\dn$ & $?$\\ -$\textit{nullable}(r^{\{n,m\}})$ & $\dn$ & $?$\\ -$\textit{nullable}(\sim{}r)$ & $\dn$ & $?$\medskip\\ -$der\, c\, ([c_1 c_2 \ldots c_n])$ & $\dn$ & $?$\\ -$der\, c\, (r^+)$ & $\dn$ & $?$\\ -$der\, c\, (r^?)$ & $\dn$ & $?$\\ -$der\, c\, (r^{\{n,m\}})$ & $\dn$ & $?$\\ -$der\, c\, (\sim{}r)$ & $\dn$ & $?$\\ -\end{tabular} -\end{center} +\item[(1c)] Implement the function \textit{simp}, which recursively + traverses a regular expression from inside to outside, and + simplifies every sub-regular-expressions on the left to + the regular expression on the right, except it does not simplify inside + ${}^*$-regular expressions. -\noindent -Remember your definitions have to satisfy the two properties - -\begin{itemize} -\item $\textit{nullable}(r)$ if and only if $[]\in L(r)$ -\item $L(der\,c\,r)) = Der\,c\,(L(r))$ -\end{itemize} - -\subsection*{Question 3} - -Implement the following regular expression for email addresses - -\[ -([a\mbox{-}z0\mbox{-}9\_\!\_\,.-]^+)\cdot @\cdot ([a\mbox{-}z0\mbox{-}9\,.-]^+)\cdot .\cdot ([a\mbox{-}z\,.]^{\{2,6\}}) -\] - -\noindent and calculate the derivative according to your email -address. When calculating the derivative, simplify all regular -expressions as much as possible by applying the -following 7 simplification rules: - -\begin{center} + \begin{center} \begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}ll} $r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ @@ -168,71 +117,60 @@ $\ZERO + r$ & $\mapsto$ & $r$\\ $r + r$ & $\mapsto$ & $r$\\ \end{tabular} + \end{center} + + For example + \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\] + + simplifies to just $r_1$. + \hfill[1 Mark] + +\item[(1d)] Implement two functions: The first, called \textit{ders}, + takes a list of characters as arguments and a regular expression and + buids the derivative as follows: + +\begin{center} +\begin{tabular}{lcl} +$\textit{ders}\;Nil\;(r)$ & $\dn$ & $r$\\ + $\textit{ders}\;c::cs\;(r)$ & $\dn$ & + $\textit{ders}\;cs\;(\textit{simp}(\textit{der}\;c\;r))$\\ +\end{tabular} \end{center} -\noindent Write down your simplified derivative in a readable -notation using parentheses where necessary. That means you -should use the infix notation $+$, $\cdot$, $^*$ and so on, -instead of code. - -\subsection*{Question 4} +The second, called \textit{matcher}, takes a string and a regular expression +as arguments. It builds first the derivatives according to \textit{ders} +and at the end tests whether the resulting redular expression can match +the empty string (using \textit{nullable}). +For example the \textit{matcher} will produce true if given the +regular expression $a\cdot b\cdot c$ and the string $abc$. +\hfill[1 Mark] -Suppose \textit{[a-z]} stands for the range regular expression -$[a,b,c,\ldots,z]$. Consider the regular expression $/ \cdot * \cdot -(\sim{}([a\mbox{-}z]^* \cdot * \cdot / \cdot [a\mbox{-}z]^*)) \cdot * -\cdot /$ and decide wether the following four strings are matched by -this regular expression. Answer yes or no. - -\begin{enumerate} -\item \texttt{"/**/"} -\item \texttt{"/*foobar*/"} -\item \texttt{"/*test*/test*/"} -\item \texttt{"/*test/*test*/"} -\end{enumerate} - -\noindent -Also test your regular expression matcher with the regular -expression $a^{\{3,5\}}$ and the strings +\item[(1e)] Implement the function \textit{replace}: it searches (from the left to +right) in string $s_1$ all the non-empty substrings that match the +regular expression---these substrings are assumed to be +the longest substrings matched by the regular expression and +assumed to be non-overlapping. All these substrings in $s_1$ are replaced +by $s_2$. For example given the regular expression -\begin{enumerate} -\setcounter{enumi}{4} -\item \texttt{aa} -\item \texttt{aaa} -\item \texttt{aaaaa} -\item \texttt{aaaaaa} -\end{enumerate} +\[(a \cdot a)^* + (b \cdot b)\] -\noindent -Does your matcher produce the expected results? - -\subsection*{Question 5} +\noindent the string $aabbbaaaaaaabaaaaabbaaaabb$ and +replacement string $c$ yields the string -Let $r_1$ be the regular expression $a\cdot a\cdot a$ and $r_2$ be -$(a^{\{19,19\}}) \cdot (a^?)$. Decide whether the following three -strings consisting of $a$s only can be matched by $(r_1^+)^+$. -Similarly test them with $(r_2^+)^+$. Again answer in all six cases -with yes or no. \medskip - -\noindent -These are strings are meant to be entirely made up of $a$s. Be careful -when copy-and-pasting the strings so as to not forgetting any $a$ and -to not introducing any other character. +\[ +ccbcabcaccc +\] -\begin{enumerate} -\item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ -aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ -aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"} -\item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ -aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ -aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"} -\item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ -aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ -aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"} -\end{enumerate} +\hfill[2 Mark] +\end{itemize} +\subsection*{Part 2 (4 Marks)} + +Coming soon. \end{document} + %%% Local Variables: %%% mode: latex %%% TeX-master: t