pep-material: comparison cws/cw03.tex

equal deleted inserted replaced

-:ca5884c2e3bd
+:8da9e0c16194
 \begin{document}
 \section*{Coursework 8 (Scala, Regular Expressions}
-This coursework is worth 10\% and is due on XXXX at
+This coursework is worth 10\%. It is about regular expressions and
-16:00. You are asked to implement a regular expression matcher.
+pattern matching. The first part is due on 30 November at 11pm; the
+second, more advanced part, is due on 7 December at 11pm. The
-Make sure the files
+second part is not yet included. For the first part you are
+asked to implement a regular expression matcher. Make sure the files
 you submit can be processed by just calling \texttt{scala
 <<filename.scala>>}.\bigskip
 \noindent
 \textbf{Important:} Do not use any mutable data structures in your
 submissions! They are not needed. This excluded the use of
 \texttt{ListBuffer}s, for example. Do not use \texttt{return} in your
-code! It has a different meaning in Scala, than in Java.
+code! It has a different meaning in Scala, than in Java.  Do not use
-Do not use \texttt{var}! This declares a mutable variable. Feel free to
+\texttt{var}! This declares a mutable variable.  Make sure the
-copy any code you need from files \texttt{knight1.scala},
-\texttt{knight2.scala} and \texttt{knight3.scala}. Make sure the
 functions you submit are defined on the ``top-level'' of Scala, not
 inside a class or object.
-\subsection*{Disclaimer}
+\subsection*{Disclaimer!!!!!!!!}
 It should be understood that the work you submit represents
-your own effort. You have not copied from anyone else. An
+your own effort! You have not copied from anyone else. An
 exception is the Scala code I showed during the lectures or
 uploaded to KEATS, which you can freely use.\bigskip
-\subsubsection*{Task}
+\subsection*{Part 1 (6 Marks)}
 The task is to implement a regular expression matcher based on
-derivatives of regular expressions. The implementation should
+derivatives of regular expressions. The implementation can deal
-be able to deal with the usual (basic) regular expressions
+with the following regular expressions, which have been predefined
+file re.scala:
 \begin{center}
 \begin{tabular}{lcll}
 $r$ & $::=$ & $\ZERO$     & cannot match anything\\
 &   $|$ & $\ONE$      & can only match the empty string\\
-&   $|$ & $c$         & can match a character $c$\\
+&   $|$ & $c$         & can match a character (in this case $c$)\\
-&   $|$ & $r_1 + r_2$ & can match either with $r_1$ or with $r_2$\\
+&   $|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\
-&   $|$ & $r_1 \cdot r_2$ & can match first with $r_1$ and then with $r_2$\\
+&   $|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\
+&  & & then the second part with $r_2$\\
 &   $|$ & $r^*$       & can match zero or more times $r$\\
-&   $|$ & $r^{\{\uparrow n\}}$ & can match zero upto $n$ times $r$\\
-&   $|$ & $r^{\{n\}}$ & can match exactly $n$ times $r$\\
 \end{tabular}
 \end{center}
 \noindent
-Implement a function called \textit{nullable} by recursion over
+Why? Knowing how to match regular expressions and strings fast will
-regular expressions:
+let you solve a lot of problems that vex other humans. Regular
+expressions are one of the fastest and simplest ways to match patterns
+in text, and are endlessly useful for searching, editing and
+analysing text in all sorts of places. However, you need to be
+fast, otherwise you will stumble upon problems such as recently reported at
+{\small
+\begin{itemize}
+\item[$\bullet$] \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}
+\item[$\bullet$] \url{https://vimeo.com/112065252}
+\item[$\bullet$] \url{http://davidvgalbraith.com/how-i-fixed-atom/}
+\end{itemize}}
+\subsection*{Tasks (file re.scala)}
+\begin{itemize}
+\item[(1a)] Implement a function, called \textit{nullable}, by recursion over
+regular expressions. This function test whether a regular expression can match
+the empty string.
 \begin{center}
 \begin{tabular}{lcl}
 $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
 $\textit{nullable}(\ONE)$  & $\dn$ & $\textit{true}$\\
 $\textit{nullable}(c)$     & $\dn$ & $\textit{false}$\\
 $\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\
 $\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
 $\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\
-$\textit{nullable}(r^{\{\uparrow n\}})$ & $\dn$ & $\textit{true}$\\
-$\textit{nullable}(r^{\{n\}})$ & $\dn$ &
-$\textit{if}\;n = 0\; \textit{then} \; \textit{true} \; \textit{else} \; \textit{nullable}(r)$\\
 \end{tabular}
-\end{center}
+\end{center}\hfill[1 Mark]
+\item[(1b)] Implement a function, called \textit{der}, by recursion over
+regular expressions. It takes a character and a regular expression
+as arguments and calculates the derivative regular expression.
 \begin{center}
 \begin{tabular}{lcl}
 $\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
 $\textit{der}\;c\;(\ONE)$  & $\dn$ & $\ZERO$\\
 $\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\
 $\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
 & & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\
 & & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\
 $\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\
-$\textit{der}\;c\;(r^{\{\uparrow n\}})$ & $\dn$ & $\textit{if}\;n = 0\;\textit{then}\;\ZERO\;\text{else}\;
-(\textit{der}\;c\;r)\cdot (r^{\{\uparrow n-1\}})$\\
-$\textit{der}\;c\;(r^{\{n\}})$ & $\dn$ &
-$\textit{if}\;n = 0\; \textit{then} \; \ZERO\; \textit{else}\;$\\
-& & $\textit{if} \;\textit{nullable}(r)\;\textit{then}\;(\textit{der}\;c\;r)\cdot (r^{\{\uparrow n-1\}})$\\
-& & $\textit{else}\;(\textit{der}\;c\;r)\cdot (r^{\{n-1\}})$
 \end{tabular}
-\end{center}
+\end{center}\hfill[1 Mark]
+\item[(1c)] Implement the function \textit{simp}, which recursively
+traverses a regular expression from inside to outside, and
+simplifies every sub-regular-expressions on the left to
+the regular expression on the right, except it does not simplify inside
+${}^*$-regular expressions.
-Be careful that your implementation of \textit{nullable} and
+\begin{center}
-\textit{der}\;c\; satisfies for every $r$ the following two
-properties (see also Question 2):
-\begin{itemize}
-\item $\textit{nullable}(r)$ if and only if $[]\in L(r)$
-\item $L(der\,c\,r) = Der\,c\,(L(r))$
-\end{itemize}
-\noindent {\bf Important!} Your implementation should have
-explicit cases for the basic regular expressions, but also
-explicit cases for the extended regular expressions. That
-means do not treat the extended regular expressions by just
-translating them into the basic ones. See also Question 2,
-where you are asked to explicitly give the rules for
-\textit{nullable} and \textit{der}\;c\; for the extended regular
-expressions.
-\subsection*{Question 1}
-What is your King's email address (you will need it in
-Question 3)?
-\subsection*{Question 2}
-This question does not require any implementation. From the
-lectures you have seen the definitions for the functions
-\textit{nullable} and \textit{der}\;c\; for the basic regular
-expressions. Give the rules for the extended regular
-expressions:
-\begin{center}
-\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
-$\textit{nullable}([c_1 c_2 \ldots c_n])$  & $\dn$ & $?$\\
-$\textit{nullable}(r^+)$                   & $\dn$ & $?$\\
-$\textit{nullable}(r^?)$                   & $\dn$ & $?$\\
-$\textit{nullable}(r^{\{n,m\}})$            & $\dn$ & $?$\\
-$\textit{nullable}(\sim{}r)$               & $\dn$ & $?$\medskip\\
-$der\, c\, ([c_1 c_2 \ldots c_n])$  & $\dn$ & $?$\\
-$der\, c\, (r^+)$                   & $\dn$ & $?$\\
-$der\, c\, (r^?)$                   & $\dn$ & $?$\\
-$der\, c\, (r^{\{n,m\}})$            & $\dn$ & $?$\\
-$der\, c\, (\sim{}r)$               & $\dn$ & $?$\\
-\end{tabular}
-\end{center}
-\noindent
-Remember your definitions have to satisfy the two properties
-\begin{itemize}
-\item $\textit{nullable}(r)$ if and only if $[]\in L(r)$
-\item $L(der\,c\,r)) = Der\,c\,(L(r))$
-\end{itemize}
-\subsection*{Question 3}
-Implement the following regular expression for email addresses
-\[
-([a\mbox{-}z0\mbox{-}9\_\!\_\,.-]^+)\cdot @\cdot ([a\mbox{-}z0\mbox{-}9\,.-]^+)\cdot .\cdot ([a\mbox{-}z\,.]^{\{2,6\}})
-\]
-\noindent and calculate the derivative according to your email
-address. When calculating the derivative, simplify all regular
-expressions as much as possible by applying the
-following 7 simplification rules:
-\begin{center}
 \begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}ll}
 $r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\
 $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\
 $r \cdot \ONE$ & $\mapsto$ & $r$\\
 $\ONE \cdot r$ & $\mapsto$ & $r$\\
 $r + \ZERO$ & $\mapsto$ & $r$\\
 $\ZERO + r$ & $\mapsto$ & $r$\\
 $r + r$ & $\mapsto$ & $r$\\
 \end{tabular}
+\end{center}
+For example
+\[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
+simplifies to just $r_1$.
+\hfill[1 Mark]
+\item[(1d)] Implement two functions: The first, called \textit{ders},
+takes a list of characters as arguments and a regular expression and
+buids the derivative as follows:
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{ders}\;Nil\;(r)$ & $\dn$ & $r$\\
+$\textit{ders}\;c::cs\;(r)$  & $\dn$ &
+$\textit{ders}\;cs\;(\textit{simp}(\textit{der}\;c\;r))$\\
+\end{tabular}
 \end{center}
-\noindent Write down your simplified derivative in a readable
+The second, called \textit{matcher}, takes a string and a regular expression
-notation using parentheses where necessary. That means you
+as arguments. It builds first the derivatives according to \textit{ders}
-should use the infix notation $+$, $\cdot$, $^*$ and so on,
+and at the end tests whether the resulting redular expression can match
-instead of code.
+the empty string (using \textit{nullable}).
+For example the \textit{matcher} will produce true if given the
-\subsection*{Question 4}
+regular expression $a\cdot b\cdot c$ and the string $abc$.
+\hfill[1 Mark]
-Suppose \textit{[a-z]} stands for the range regular expression
+\item[(1e)] Implement the function \textit{replace}: it searches (from the left to
-$[a,b,c,\ldots,z]$.  Consider the regular expression $/ \cdot * \cdot
+right) in string $s_1$ all the non-empty substrings that match the
-(\sim{}([a\mbox{-}z]^* \cdot * \cdot / \cdot [a\mbox{-}z]^*)) \cdot *
+regular expression---these substrings are assumed to be
-\cdot /$ and decide wether the following four strings are matched by
+the longest substrings matched by the regular expression and
-this regular expression. Answer yes or no.
+assumed to be non-overlapping. All these substrings in $s_1$ are replaced
+by $s_2$. For example given the regular expression
-\begin{enumerate}
+\[(a \cdot a)^* + (b \cdot b)\]
-\item \texttt{"/**/"}
-\item \texttt{"/*foobar*/"}
-\item \texttt{"/*test*/test*/"}
-\item \texttt{"/*test/*test*/"}
-\end{enumerate}
-\noindent
+\noindent the string $aabbbaaaaaaabaaaaabbaaaabb$ and
-Also test your regular expression matcher with the regular
+replacement string $c$ yields the string
-expression $a^{\{3,5\}}$ and the strings
-\begin{enumerate}
+\[
-\setcounter{enumi}{4}
+ccbcabcaccc
-\item \texttt{aa}
+\]
-\item \texttt{aaa}
-\item \texttt{aaaaa}
-\item \texttt{aaaaaa}
-\end{enumerate}
-\noindent
+\hfill[2 Mark]
-Does your matcher produce the expected results?
+\end{itemize}
-\subsection*{Question 5}
+\subsection*{Part 2 (4 Marks)}
-Let $r_1$ be the regular expression $a\cdot a\cdot a$ and $r_2$ be
+Coming soon.
-$(a^{\{19,19\}}) \cdot (a^?)$.  Decide whether the following three
-strings consisting of $a$s only can be matched by $(r_1^+)^+$.
-Similarly test them with $(r_2^+)^+$. Again answer in all six cases
-with yes or no. \medskip
-\noindent
-These are strings are meant to be entirely made up of $a$s. Be careful
-when copy-and-pasting the strings so as to not forgetting any $a$ and
-to not introducing any other character.
-\begin{enumerate}
-\item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\
-aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\
-aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
-\item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\
-aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\
-aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
-\item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\
-aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\
-aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
-\end{enumerate}
 \end{document}
 %%% Local Variables:
 %%% mode: latex
 %%% TeX-master: t
 %%% End:

changeset 68	8da9e0c16194
parent 62	2151c77e1e24
child 69	f1295a0ab4ed