% !TEX program = xelatex+ −
\documentclass{article}+ −
\usepackage{../style}+ −
\usepackage{../langs}+ −
+ −
\usepackage{array}+ −
+ −
+ −
\begin{document}+ −
\newcolumntype{C}[1]{>{\centering}m{#1}}+ −
+ −
\section*{Coursework 1}+ −
+ −
This coursework is worth 5\% and is due on \cwONE{} at 18:00. You are+ −
asked to implement a regular expression matcher and submit a document+ −
containing the answers for the questions below. You can do the+ −
implementation in any programming language you like, but you need to+ −
submit the source code with which you answered the questions,+ −
otherwise a mark of 0\% will be awarded. You can submit your answers+ −
in a txt-file or pdf. Code send as code. Please package everything+ −
inside a zip-file that creates a directory with the name+ −
\[\texttt{YournameYourfamilyname}\]+ −
+ −
\noindent on my end. Thanks!+ −
+ −
+ −
+ −
\subsubsection*{Disclaimer\alert}+ −
+ −
It should be understood that the work you submit represents+ −
your own effort. You have not copied from anyone else. An+ −
exception is the Scala code I showed during the lectures or+ −
uploaded to KEATS, which you can freely use.\bigskip+ −
+ −
\noindent+ −
If you have any questions, please send me an email in \textbf{good}+ −
time.\bigskip+ −
+ −
\subsection*{Task}+ −
+ −
The task is to implement a regular expression matcher based on+ −
derivatives of regular expressions. The implementation should+ −
be able to deal with the usual (basic) regular expressions+ −
+ −
\[+ −
\ZERO,\; \ONE,\; c,\; r_1 + r_2,\; r_1 \cdot r_2,\; r^*+ −
\]+ −
+ −
\noindent+ −
but also with the following extended regular expressions:+ −
+ −
\begin{center}+ −
\begin{tabular}{ll}+ −
$[c_1,c_2,\ldots,c_n]$ & a set of characters---for character ranges\\+ −
$r^+$ & one or more times $r$\\+ −
$r^?$ & optional $r$\\+ −
$r^{\{n\}}$ & exactly $n$-times\\+ −
$r^{\{..m\}}$ & zero or more times $r$ but no more than $m$-times\\+ −
$r^{\{n..\}}$ & at least $n$-times $r$\\+ −
$r^{\{n..m\}}$ & at least $n$-times $r$ but no more than $m$-times\\+ −
$\sim{}r$ & not-regular-expression of $r$\\+ −
\end{tabular}+ −
\end{center}+ −
+ −
\noindent You can assume that $n$ and $m$ are greater or equal than+ −
$0$. In the case of $r^{\{n,m\}}$ you can also assume $0 \le n \le m$.\bigskip+ −
+ −
\noindent {\bf Important!} Your implementation should have explicit+ −
case classes for the basic regular expressions, but also explicit case+ −
classes for+ −
the extended regular expressions.\footnote{Please call them+ −
\code{RANGE}, \code{PLUS}, \code{OPTIONAL}, \code{NTIMES},+ −
\code{UPTO}, \code{FROM} and \code{BETWEEN}.} + −
That means do not treat the extended regular expressions+ −
by just translating them into the basic ones. See also Question 3,+ −
where you are asked to explicitly give the rules for \textit{nullable}+ −
and \textit{der} for the extended regular expressions. Something like+ −
+ −
\[der\,c\,(r^+) \dn der\,c\,(r\cdot r^*)\]+ −
+ −
\noindent is \emph{not} allowed as answer in Question 3 and \emph{not}+ −
allowed in your code.\medskip+ −
+ −
\noindent+ −
The meanings of the extended regular expressions are+ −
+ −
\begin{center}+ −
\begin{tabular}{r@{\hspace{2mm}}c@{\hspace{2mm}}l}+ −
$L([c_1,c_2,\ldots,c_n])$ & $\dn$ & $\{[c_1], [c_2], \ldots, [c_n]\}$\\ + −
$L(r^+)$ & $\dn$ & $\bigcup_{1\le i}.\;L(r)^i$\\+ −
$L(r^?)$ & $\dn$ & $L(r) \cup \{[]\}$\\+ −
$L(r^{\{n\}})$ & $\dn$ & $L(r)^n$\\+ −
$L(r^{\{..m\}})$ & $\dn$ & $\bigcup_{0\le i \le m}.\;L(r)^i$\\+ −
$L(r^{\{n..\}})$ & $\dn$ & $\bigcup_{n\le i}.\;L(r)^i$\\+ −
$L(r^{\{n..m\}})$ & $\dn$ & $\bigcup_{n\le i \le m}.\;L(r)^i$\\+ −
$L(\sim{}r)$ & $\dn$ & $\Sigma^* - L(r)$+ −
\end{tabular}+ −
\end{center}+ −
+ −
\noindent whereby in the last clause the set $\Sigma^*$ stands+ −
for the set of \emph{all} strings over the alphabet $\Sigma$+ −
(in the implementation the alphabet can be just what is+ −
represented by, say, the type \pcode{Char}). So $\sim{}r$+ −
means in effect ``all the strings that $r$ cannot match''.\medskip + −
+ −
\noindent+ −
Be careful that your implementation of \textit{nullable} and+ −
\textit{der} satisfies for every regular expression $r$ the following+ −
two properties (see also Question 3):+ −
+ −
\begin{itemize}+ −
\item $\textit{nullable}(r)$ if and only if $[]\in L(r)$+ −
\item $L(der\,c\,r) = Der\,c\,(L(r))$+ −
\end{itemize}+ −
+ −
+ −
+ −
\subsection*{Question 1 (Unmarked)}+ −
+ −
What is your King's email address (you will need it in+ −
Question 5)?+ −
+ −
\subsection*{Question 2 (Unmarked)}+ −
+ −
Can you please list all programming languages in which you have+ −
already written programs (include only instances where you have spent+ −
at least a good working day fiddling with a program)? This is just+ −
for my curiosity to estimate what your background is.+ −
+ −
\subsection*{Question 3}+ −
+ −
From the+ −
lectures you have seen the definitions for the functions+ −
\textit{nullable} and \textit{der} for the basic regular+ −
expressions. Implement and write down the rules for the extended+ −
regular expressions:+ −
+ −
\begin{center}+ −
\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}+ −
$\textit{nullable}([c_1,c_2,\ldots,c_n])$ & $\dn$ & $?$\\+ −
$\textit{nullable}(r^+)$ & $\dn$ & $?$\\+ −
$\textit{nullable}(r^?)$ & $\dn$ & $?$\\+ −
$\textit{nullable}(r^{\{n\}})$ & $\dn$ & $?$\\+ −
$\textit{nullable}(r^{\{..m\}})$ & $\dn$ & $?$\\+ −
$\textit{nullable}(r^{\{n..\}})$ & $\dn$ & $?$\\+ −
$\textit{nullable}(r^{\{n..m\}})$ & $\dn$ & $?$\\+ −
$\textit{nullable}(\sim{}r)$ & $\dn$ & $?$+ −
\end{tabular}+ −
\end{center}+ −
+ −
\begin{center}+ −
\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}+ −
$der\, c\, ([c_1,c_2,\ldots,c_n])$ & $\dn$ & $?$\\+ −
$der\, c\, (r^+)$ & $\dn$ & $?$\\+ −
$der\, c\, (r^?)$ & $\dn$ & $?$\\+ −
$der\, c\, (r^{\{n\}})$ & $\dn$ & $?$\\+ −
$der\, c\, (r^{\{..m\}})$ & $\dn$ & $?$\\+ −
$der\, c\, (r^{\{n..\}})$ & $\dn$ & $?$\\+ −
$der\, c\, (r^{\{n..m\}})$ & $\dn$ & $?$\\+ −
$der\, c\, (\sim{}r)$ & $\dn$ & $?$\\+ −
\end{tabular}+ −
\end{center}+ −
+ −
\noindent+ −
Remember your definitions have to satisfy the two properties+ −
+ −
\begin{itemize}+ −
\item $\textit{nullable}(r)$ if and only if $[]\in L(r)$+ −
\item $L(der\,c\,r)) = Der\,c\,(L(r))$+ −
\end{itemize}+ −
+ −
\noindent+ −
Given the definitions of \textit{nullable} and \textit{der}, it is+ −
easy to implement a regular expression matcher. Test your regular+ −
expression matcher with (at least) the examples:+ −
+ −
+ −
\begin{center}+ −
\def\arraystretch{1.2} + −
\begin{tabular}{@{}r|m{3mm}|m{6mm}|m{6mm}|m{10mm}|m{6mm}|m{10mm}|m{10mm}|m{10mm}}+ −
string & $a^?$ & $\sim{}a$ & $a^{\{3\}}$ & $(a^?)^{\{3\}}$ & $a^{\{..3\}}$ &+ −
$(a^?)^{\{..3\}}$ & $a^{\{3..5\}}$ & $(a^?)^{\{3..5\}}$ \\\hline+ −
$[]$ &&&&&&& \\\hline + −
\texttt{a} &&&&&&& \\\hline + −
\texttt{aa} &&&&&&& \\\hline + −
\texttt{aaa} &&&&&&& \\\hline + −
\texttt{aaaaa} &&&&&&& \\\hline + −
\texttt{aaaaaa}&&&&&&& \\+ −
\end{tabular}+ −
\end{center}+ −
+ −
\noindent+ −
Does your matcher produce the expected results? Make sure you+ −
also test corner-cases, like $a^{\{0\}}$!+ −
+ −
\subsection*{Question 4}+ −
+ −
As you can see, there are a number of explicit regular expressions+ −
that deal with single or several characters, for example:+ −
+ −
\begin{center}+ −
\begin{tabular}{ll}+ −
$c$ & matches a single character\\ + −
$[c_1,c_2,\ldots,c_n]$ & matches a set of characters---for character ranges\\+ −
$\textit{ALL}$ & matches any character+ −
\end{tabular}+ −
\end{center}+ −
+ −
\noindent+ −
The latter is useful for matching any string (for example+ −
by using $\textit{ALL}^*$). In order to avoid having an explicit constructor+ −
for each case, we can generalise all these cases and introduce a single+ −
constructor $\textit{CFUN}(f)$ where $f$ is a function from characters+ −
to booleans. In Scala code this would look as follows:+ −
+ −
\begin{lstlisting}[numbers=none]+ −
abstract class Rexp+ −
...+ −
case class CFUN(f: Char => Boolean) extends Rexp + −
\end{lstlisting}\smallskip+ −
+ −
\noindent+ −
The idea is that the function $f$ determines which character(s)+ −
are matched, namely those where $f$ returns \texttt{true}.+ −
In this question implement \textit{CFUN} and define+ −
+ −
\begin{center}+ −
\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}+ −
$\textit{nullable}(\textit{CFUN}(f))$ & $\dn$ & $?$\\+ −
$\textit{der}\,c\,(\textit{CFUN}(f))$ & $\dn$ & $?$+ −
\end{tabular}+ −
\end{center}+ −
+ −
\noindent in your matcher and then also give definitions for+ −
+ −
\begin{center}+ −
\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}+ −
$c$ & $\dn$ & $\textit{CFUN}(?)$\\+ −
$[c_1,c_2,\ldots,c_n]$ & $\dn$ & $\textit{CFUN}(?)$\\+ −
$\textit{ALL}$ & $\dn$ & $\textit{CFUN}(?)$+ −
\end{tabular}+ −
\end{center}+ −
+ −
\noindent+ −
You can either add the constructor $CFUN$ to your implementation in+ −
Question 3, or you can implement this questions first+ −
and then use $CFUN$ instead of \code{RANGE} and \code{CHAR} in Question 3.+ −
+ −
+ −
\subsection*{Question 5}+ −
+ −
Suppose $[a\mbox{-}z0\mbox{-}9\_\,.\mbox{-}]$ stands for the regular expression+ −
+ −
\[[a,b,c,\ldots,z,0,\dots,9,\_,.,\mbox{-}]\;.\]+ −
+ −
\noindent+ −
Define in your code the following regular expression for email addresses+ −
+ −
\[+ −
([a\mbox{-}z0\mbox{-}9\_\,.-]^+)\cdot @\cdot ([a\mbox{-}z0\mbox{-}9\,.-]^+)\cdot .\cdot ([a\mbox{-}z\,.]^{\{2,6\}})+ −
\]+ −
+ −
\noindent and calculate the derivative according to your own email+ −
address. When calculating the derivative, simplify all regular+ −
expressions as much as possible by applying the+ −
following 7 simplification rules:+ −
+ −
\begin{center}+ −
\begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}ll}+ −
$r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ + −
$\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ + −
$r \cdot \ONE$ & $\mapsto$ & $r$\\ + −
$\ONE \cdot r$ & $\mapsto$ & $r$\\ + −
$r + \ZERO$ & $\mapsto$ & $r$\\ + −
$\ZERO + r$ & $\mapsto$ & $r$\\ + −
$r + r$ & $\mapsto$ & $r$\\ + −
\end{tabular}+ −
\end{center}+ −
+ −
\noindent Write down your simplified derivative in a readable+ −
notation using parentheses where necessary. That means you+ −
should use the infix notation $+$, $\cdot$, $^*$ and so on,+ −
instead of raw code.\bigskip+ −
+ −
+ −
\subsection*{Question 6}+ −
+ −
Implement the simplification rules in your regular expression matcher.+ −
Consider the regular expression $/ \cdot * \cdot+ −
(\sim{}(\textit{ALL}^* \cdot * \cdot / \cdot \textit{ALL}^*)) \cdot *+ −
\cdot /$ and decide whether the following four strings are matched by+ −
this regular expression. Answer yes or no.+ −
+ −
\begin{enumerate}+ −
\item \texttt{"/**/"}+ −
\item \texttt{"/*foobar*/"}+ −
\item \texttt{"/*test*/test*/"}+ −
\item \texttt{"/*test/*test*/"}+ −
\end{enumerate}+ −
+ −
\subsection*{Question 7}+ −
+ −
Let $r_1$ be the regular expression $a\cdot a\cdot a$ and $r_2$ be+ −
$(a^{\{19,19\}}) \cdot (a^?)$.\medskip+ −
+ −
\noindent+ −
Decide whether the following three+ −
strings consisting of $a$s only can be matched by $(r_1^+)^+$.+ −
Similarly test them with $(r_2^+)^+$. Again answer in all six cases+ −
with yes or no. \medskip+ −
+ −
\noindent+ −
These are strings are meant to be entirely made up of $a$s. Be careful+ −
when copy-and-pasting the strings so as to not forgetting any $a$ and+ −
to not introducing any other character.+ −
+ −
\begin{enumerate}+ −
\setcounter{enumi}{4}+ −
\item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\+ −
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\+ −
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}+ −
\item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ + −
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ + −
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}+ −
\item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ + −
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ + −
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}+ −
\end{enumerate}+ −
+ −
+ −
+ −
\end{document}+ −
+ −
%%% Local Variables: + −
%%% mode: latex+ −
%%% TeX-master: t+ −
%%% End: + −