diff -r 1e4da6d2490c -r e5f4b8ff23b8 coursework/cw01.tex --- a/coursework/cw01.tex Fri Sep 26 14:06:55 2014 +0100 +++ b/coursework/cw01.tex Fri Sep 26 14:40:49 2014 +0100 @@ -6,23 +6,34 @@ \section*{Coursework 1} -This coursework is worth 3\% and is due on 12 November at 16:00. You are asked to implement -a regular expression matcher and submit a document containing the answers for the questions -below. You can do the implementation in any programming language you like, but you need -to submit the source code with which you answered the questions. However, the coursework -will \emph{only} be judged according to the answers. You can submit your answers -in a txt-file or pdf.\bigskip +This coursework is worth 5\% and is due on 13 October at 16:00. You +are asked to implement a regular expression matcher and submit a +document containing the answers for the questions below. You can do +the implementation in any programming language you like, but you need +to submit the source code with which you answered the +questions. However, the coursework will \emph{only} be judged +according to the answers. You can submit your answers in a txt-file or +pdf. + +\subsubsection*{Disclaimer} -\noindent -The task is to implement a regular expression matcher based on derivatives. The implementation -should be able to deal with the usual regular expressions +It should be understood that the work submitted represents your own effort. +You have not copied from anyone else. An exception is the Scala code I +showed during the lectures, which you can use.\bigskip + + +\subsubsection*{Tasks} + +The task is to implement a regular expression matcher based on +derivatives. The implementation should be able to deal with the usual +(basic) regular expressions \[ \varnothing, \epsilon, c, r_1 + r_2, r_1 \cdot r_2, r^* \] \noindent -but also with +but also with the following extended regular expressions: \begin{center} \begin{tabular}{ll} @@ -40,38 +51,67 @@ \begin{center} \begin{tabular}{r@{\hspace{2mm}}c@{\hspace{2mm}}l} -$L([c_1 c_2 \ldots c_n])$ & $\dn$ & $\{"c_1", "c_2", \ldots, "c_n"\}$\\ -$L(r^+)$ & $\dn$ & $\bigcup_{1\le i}. L(r)^i$\\ -$L(r^?)$ & $\dn$ & $L(r) \cup \{""\}$\\ -$L(r^{\{n,m\}})$ & $\dn$ & $\bigcup_{n\le i \le m}. L(r)^i$\\ -$L(\sim{}r)$ & $\dn$ & $UNIV - L(r)$ +$L([c_1 c_2 \ldots c_n])$ & $\dn$ & $\{[c_1], [c_2], \ldots, [c_n]\}$\\ +$L(r^+)$ & $\dn$ & $\bigcup_{1\le i}. L(r)^i$\\ +$L(r^?)$ & $\dn$ & $L(r) \cup \{[]\}$\\ +$L(r^{\{n,m\}})$ & $\dn$ & $\bigcup_{n\le i \le m}. L(r)^i$\\ +$L(\sim{}r)$ & $\dn$ & $\mathbb{A} - L(r)$ \end{tabular} \end{center} \noindent -whereby in the last clause the set $UNIV$ stands for the set of \emph{all} strings. -So $\sim{}r$ means `all the strings that $r$ cannot match'. We assume ranges -like $[a\mbox{-}z0\mbox{-}9]$ are a shorthand for the regular expression +whereby in the last clause the set $\mathbb{A}$ stands for the set of +\emph{all} strings. So $\sim{}r$ means `all the strings that $r$ +cannot match'. We assume ranges like $[a\mbox{-}z0\mbox{-}9]$ are a +shorthand for the regular expression \[ [a b c d\ldots z 0 1\ldots 9]\;. \] \noindent -Be careful that your implementation of $nullable$ and $der$ satisfies for every $r$ the following two -properties: +Be careful that your implementation of $nullable$ and $der$ satisfies +for every $r$ the following two properties: \begin{itemize} -\item $nullable(r)$ if and only if $""\in L(r)$ +\item $nullable(r)$ if and only if $[]\in L(r)$ \item $L(der\,c\,r)) = Der\,c\,(L(r))$ \end{itemize} -\newpage + +\noindent +{\bf Important!} Your implementation should have explicit cases for the +basic regular expressions, but also for the extended regular expressions. +That means do not treat the extended regular expressions by just translating +them into the basic ones. See also Question 2, where you asked to give +the rules for \textit{nullable} and \textit{der}. + \subsection*{Question 1 (unmarked)} -What is your King's email address (you will need it in the next question)?\bigskip +What is your King's email address (you will need it in Question 2)? + +\subsection*{Question 2 (marked with 2\%)} + +This question does not require any implementation. From the lectures +you have seen the definitions for the functions \textit{nullable} and +\textit{der}. Give the rules for the extended regular expressions: -\subsection*{Question 2 (marked with 1\%)} +\begin{center} +\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}} +$nullable([c_1 c_2 \ldots c_n])$ & $\dn$ & $?$\\ +$nullable(r^+)$ & $\dn$ & $?$\\ +$nullable(r^?)$ & $\dn$ & $?$\\ +$nullable(r^{\{n,m\}})$ & $\dn$ & $?$\\ +$nullable(\sim{}r)$ & $\dn$ & $?$\medskip\\ +$der c ([c_1 c_2 \ldots c_n])$ & $\dn$ & $?$\\ +$der c (r^+)$ & $\dn$ & $?$\\ +$der c (r^?)$ & $\dn$ & $?$\\ +$der c (r^{\{n,m\}})$ & $\dn$ & $?$\\ +$der c (\sim{}r)$ & $\dn$ & $?$\\ +\end{tabular} +\end{center} + +\subsection*{Question 3 (marked with 1\%)} Implement the following regular expression for email addresses @@ -80,9 +120,10 @@ \] \noindent -and calculate the derivative according to your email address. When calculating -the derivative, simplify all regular expressions as much as possible, but at least apply the following -six simplification rules: +and calculate the derivative according to your email address. When +calculating the derivative, simplify all regular expressions as much +as possible, but at least apply the following six simplification +rules: \begin{center} \begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}l} @@ -96,12 +137,15 @@ \end{center} \noindent -Write down your simplified derivative in the ``mathematicical'' notation using parentheses where necessary. +Write down your simplified derivative in the ``mathematicical'' +notation using parentheses where necessary. + +\subsection*{Question 4 (marked with 1\%)} -\subsection*{Question 3 (marked with 1\%)} - -Consider the regular expression $/ \cdot * \cdot (\sim{}([a\mbox{-}z]^* \cdot * \cdot / \cdot [a\mbox{-}z]^*)) \cdot * \cdot /$ and decide -wether the following four strings are matched by this regular expression. Answer yes or no. +Consider the regular expression $/ \cdot * \cdot +(\sim{}([a\mbox{-}z]^* \cdot * \cdot / \cdot [a\mbox{-}z]^*)) \cdot * +\cdot /$ and decide wether the following four strings are matched by +this regular expression. Answer yes or no. \begin{enumerate} \item \texttt{"/**/"} @@ -110,16 +154,18 @@ \item \texttt{"/*test/*test*/"} \end{enumerate} -\subsection*{Question 4 (marked with 1\%)} +\subsection*{Question 5 (marked with 1\%)} -Let $r_1$ be the regular expression $a\cdot a\cdot a$ and $r_2$ be $(a^{\{19,19\}}) \cdot (a^?)$. -Decide whether the following three strings consisting of $a$s only can be matched by $(r_1^+)^+$. -Similarly test them with $(r_2^+)^+$. Again answer in all six cases with yes or no. \medskip +Let $r_1$ be the regular expression $a\cdot a\cdot a$ and $r_2$ be +$(a^{\{19,19\}}) \cdot (a^?)$. Decide whether the following three +strings consisting of $a$s only can be matched by $(r_1^+)^+$. +Similarly test them with $(r_2^+)^+$. Again answer in all six cases +with yes or no. \medskip \noindent -These are strings are meant to be entirely made up of $a$s. Be careful when -copy-and-pasting the strings so as to not forgetting any $a$ and to not introducing any -other character. +These are strings are meant to be entirely made up of $a$s. Be careful +when copy-and-pasting the strings so as to not forgetting any $a$ and +to not introducing any other character. \begin{enumerate} \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\