coursework/cw01.tex
author Christian Urban <urbanc@in.tum.de>
Wed, 04 Dec 2019 14:39:32 +0000
changeset 703 57b3ae5ba5e2
parent 630 9b1c15c3eb6f
child 718 b7f9460c6a1c
permissions -rw-r--r--
updated

% !TEX program = xelatex
\documentclass{article}
\usepackage{../style}
\usepackage{../langs}

\usepackage{array}


\begin{document}
\newcolumntype{C}[1]{>{\centering}m{#1}}

\section*{Coursework 1 (Strand 1)}

This coursework is worth 4\% and is due on 11 October at
18:00. You are asked to implement a regular expression matcher
and submit a document containing the answers for the questions
below. You can do the implementation in any programming
language you like, but you need to submit the source code with
which you answered the questions, otherwise a mark of 0\% will
be awarded. You can submit your answers in a txt-file or pdf.
Code send as code.



\subsubsection*{Disclaimer}

It should be understood that the work you submit represents
your own effort. You have not copied from anyone else. An
exception is the Scala code I showed during the lectures or
uploaded to KEATS, which you can freely use.\bigskip

\noindent
If you have any questions, please send me an email in \textbf{good}
time.\bigskip

\subsection*{Task}

The task is to implement a regular expression matcher based on
derivatives of regular expressions. The implementation should
be able to deal with the usual (basic) regular expressions

\[
\ZERO,\; \ONE,\; c,\; r_1 + r_2,\; r_1 \cdot r_2,\; r^*
\]

\noindent
but also with the following extended regular expressions:

\begin{center}
\begin{tabular}{ll}
  $[c_1,c_2,\ldots,c_n]$ & a set of characters---for character ranges\\
  $r^+$ & one or more times $r$\\
  $r^?$ & optional $r$\\
  $r^{\{n\}}$ & exactly $n$-times\\
  $r^{\{..m\}}$ & zero or more times $r$ but no more than $m$-times\\
  $r^{\{n..\}}$ & at least $n$-times $r$\\
  $r^{\{n..m\}}$ & at least $n$-times $r$ but no more than $m$-times\\
  $\sim{}r$ & not-regular-expression of $r$\\
\end{tabular}
\end{center}

\noindent You can assume that $n$ and $m$ are greater or equal than
$0$. In the case of $r^{\{n,m\}}$ you can also assume $0 \le n \le m$.\bigskip

\noindent {\bf Important!} Your implementation should have explicit
case classes  for the basic regular expressions, but also explicit case
classes for
the extended regular expressions.\footnote{Please call them
  \code{RANGE}, \code{PLUS}, \code{OPTIONAL}, \code{NTIMES},
  \code{UPTO}, \code{FROM} and \code{BETWEEN}.} 
  That means do not treat the extended regular expressions
by just translating them into the basic ones. See also Question 3,
where you are asked to explicitly give the rules for \textit{nullable}
and \textit{der} for the extended regular expressions. So something like
$der\,c\,(r^+) \dn der\,c\,(r\cdot r^*)$ is \emph{not} allowed.\medskip

\noindent
The meanings of the extended regular expressions are

\begin{center}
\begin{tabular}{r@{\hspace{2mm}}c@{\hspace{2mm}}l}
  $L([c_1,c_2,\ldots,c_n])$ & $\dn$ & $\{[c_1], [c_2], \ldots, [c_n]\}$\\ 
  $L(r^+)$                  & $\dn$ & $\bigcup_{1\le i}.\;L(r)^i$\\
  $L(r^?)$                  & $\dn$ & $L(r) \cup \{[]\}$\\
  $L(r^{\{n\}})$             & $\dn$ & $L(r)^n$\\
  $L(r^{\{..m\}})$           & $\dn$ & $\bigcup_{0\le i \le m}.\;L(r)^i$\\
  $L(r^{\{n..\}})$           & $\dn$ & $\bigcup_{n\le i}.\;L(r)^i$\\
  $L(r^{\{n..m\}})$          & $\dn$ & $\bigcup_{n\le i \le m}.\;L(r)^i$\\
  $L(\sim{}r)$              & $\dn$ & $\Sigma^* - L(r)$
\end{tabular}
\end{center}

\noindent whereby in the last clause the set $\Sigma^*$ stands
for the set of \emph{all} strings over the alphabet $\Sigma$
(in the implementation the alphabet can be just what is
represented by, say, the type \pcode{Char}). So $\sim{}r$
means in effect ``all the strings that $r$ cannot match''.\medskip 

\noindent
Be careful that your implementation of \textit{nullable} and
\textit{der} satisfies for every regular expression $r$ the following
two properties (see also Question 3):

\begin{itemize}
\item $\textit{nullable}(r)$ if and only if $[]\in L(r)$
\item $L(der\,c\,r) = Der\,c\,(L(r))$
\end{itemize}



\subsection*{Question 1 (Unmarked)}

What is your King's email address (you will need it in
Question 5)?

\subsection*{Question 2 (Unmarked)}

Can you please list all programming languages in which you have
already written programs (like spent at least a good working day
fiddling with the program or programs)?  This is just for my
curiosity to estimate what your background is.

\subsection*{Question 3}

From the
lectures you have seen the definitions for the functions
\textit{nullable} and \textit{der} for the basic regular
expressions. Implement and write down rules for the extended
regular expressions:

\begin{center}
\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
  $\textit{nullable}([c_1,c_2,\ldots,c_n])$  & $\dn$ & $?$\\
  $\textit{nullable}(r^+)$                   & $\dn$ & $?$\\
  $\textit{nullable}(r^?)$                   & $\dn$ & $?$\\
  $\textit{nullable}(r^{\{n\}})$              & $\dn$ & $?$\\
  $\textit{nullable}(r^{\{..m\}})$            & $\dn$ & $?$\\
  $\textit{nullable}(r^{\{n..\}})$            & $\dn$ & $?$\\
  $\textit{nullable}(r^{\{n..m\}})$           & $\dn$ & $?$\\
  $\textit{nullable}(\sim{}r)$              & $\dn$ & $?$
\end{tabular}
\end{center}

\begin{center}
\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
  $der\, c\, ([c_1,c_2,\ldots,c_n])$  & $\dn$ & $?$\\
  $der\, c\, (r^+)$                   & $\dn$ & $?$\\
  $der\, c\, (r^?)$                   & $\dn$ & $?$\\
  $der\, c\, (r^{\{n\}})$              & $\dn$ & $?$\\
  $der\, c\, (r^{\{..m\}})$           & $\dn$ & $?$\\
  $der\, c\, (r^{\{n..\}})$           & $\dn$ & $?$\\
  $der\, c\, (r^{\{n..m\}})$           & $\dn$ & $?$\\
  $der\, c\, (\sim{}r)$               & $\dn$ & $?$\\
\end{tabular}
\end{center}

\noindent
Remember your definitions have to satisfy the two properties

\begin{itemize}
\item $\textit{nullable}(r)$ if and only if $[]\in L(r)$
\item $L(der\,c\,r)) = Der\,c\,(L(r))$
\end{itemize}

\noindent
Given the definitions of \textit{nullable} and \textit{der}, it is
easy to implement a regular expression matcher.  Test your regular
expression matcher with (at least) the examples:


\begin{center}
\def\arraystretch{1.2}  
\begin{tabular}{r|m{12mm}|m{12mm}|m{12mm}|m{12mm}|m{12mm}|m{12mm}}
  string & $a^{\{3\}}$ & $(a^?)^{\{3\}}$ & $a^{\{..3\}}$ &
     $(a^?)^{\{..3\}}$ & $a^{\{3..5\}}$ & $(a^?)^{\{3..5\}}$\\\hline
  $[]$           &&&&&& \\\hline 
  \texttt{a}     &&&&&& \\\hline 
  \texttt{aa}    &&&&&& \\\hline 
  \texttt{aaa}   &&&&&& \\\hline 
  \texttt{aaaaa} &&&&&& \\\hline 
  \texttt{aaaaaa}&&&&&& \\
\end{tabular}
\end{center}

\noindent
Does your matcher produce the expected results?

\subsection*{Question 4}

As you can see, there are a number of explicit regular expressions
that deal with single or several characters, for example:

\begin{center}
\begin{tabular}{ll}
  $c$ & matches a single character\\  
  $[c_1,c_2,\ldots,c_n]$ & matches a set of characters---for character ranges\\
  $\textit{ALL}$ & matches any character
\end{tabular}
\end{center}

\noindent
The latter is useful for matching any string (for example
by using $\textit{ALL}^*$). In order to avoid having an explicit constructor
for each case, we can generalise all these cases and introduce a single
constructor $\textit{CFUN}(f)$ where $f$ is a function from characters
to booleans. In Scala code this would look as follows:

\begin{lstlisting}[numbers=none]
abstract class Rexp
...
case class CFUN(f: Char => Boolean) extends Rexp 
\end{lstlisting}\smallskip

\noindent
The idea is that the function $f$ determines which character(s)
are matched, namely those where $f$ returns \texttt{true}.
In this question implement \textit{CFUN} and define

\begin{center}
\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
  $\textit{nullable}(\textit{CFUN}(f))$  & $\dn$ & $?$\\
  $\textit{der}\,c\,(\textit{CFUN}(f))$  & $\dn$ & $?$
\end{tabular}
\end{center}

\noindent in your matcher and then also give definitions for

\begin{center}
\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
  $c$  & $\dn$ & $\textit{CFUN}(?)$\\
  $[c_1,c_2,\ldots,c_n]$  & $\dn$ & $\textit{CFUN}(?)$\\
  $\textit{ALL}$  & $\dn$ & $\textit{CFUN}(?)$
\end{tabular}
\end{center}

\noindent
You can either add the constructor $CFUN$ to your implementation in
Question 3, or you can implement this questions first
and then use $CFUN$ instead of \code{RANGE} and \code{CHAR} in Question 3.


\subsection*{Question 5}

Suppose $[a\mbox{-}z0\mbox{-}9\_\,.\mbox{-}]$ stands for the regular expression

\[[a,b,c,\ldots,z,0,\dots,9,\_,.,\mbox{-}]\;.\]

\noindent
Define in your code the following regular expression for email addresses

\[
([a\mbox{-}z0\mbox{-}9\_\,.-]^+)\cdot @\cdot ([a\mbox{-}z0\mbox{-}9\,.-]^+)\cdot .\cdot ([a\mbox{-}z\,.]^{\{2,6\}})
\]

\noindent and calculate the derivative according to your own email
address. When calculating the derivative, simplify all regular
expressions as much as possible by applying the
following 7 simplification rules:

\begin{center}
\begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}ll}
$r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ 
$\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ 
$r \cdot \ONE$ & $\mapsto$ & $r$\\ 
$\ONE \cdot r$ & $\mapsto$ & $r$\\ 
$r + \ZERO$ & $\mapsto$ & $r$\\ 
$\ZERO + r$ & $\mapsto$ & $r$\\ 
$r + r$ & $\mapsto$ & $r$\\ 
\end{tabular}
\end{center}

\noindent Write down your simplified derivative in a readable
notation using parentheses where necessary. That means you
should use the infix notation $+$, $\cdot$, $^*$ and so on,
instead of raw code.\bigskip


\subsection*{Question 6}

Implement the simplification rules in your regular expression matcher.
Consider the regular expression $/ \cdot * \cdot
(\sim{}(\textit{ALL}^* \cdot * \cdot / \cdot \textit{ALL}^*)) \cdot *
\cdot /$ and decide whether the following four strings are matched by
this regular expression. Answer yes or no.

\begin{enumerate}
\item \texttt{"/**/"}
\item \texttt{"/*foobar*/"}
\item \texttt{"/*test*/test*/"}
\item \texttt{"/*test/*test*/"}
\end{enumerate}

\subsection*{Question 7}

Let $r_1$ be the regular expression $a\cdot a\cdot a$ and $r_2$ be
$(a^{\{19,19\}}) \cdot (a^?)$.  Decide whether the following three
strings consisting of $a$s only can be matched by $(r_1^+)^+$.
Similarly test them with $(r_2^+)^+$. Again answer in all six cases
with yes or no. \medskip

\noindent
These are strings are meant to be entirely made up of $a$s. Be careful
when copy-and-pasting the strings so as to not forgetting any $a$ and
to not introducing any other character.

\begin{enumerate}
\setcounter{enumi}{4}
\item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
\item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
\item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
\end{enumerate}



\end{document}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End: