pep-material: cws/cw03.tex@fb2d8012ed08


\documentclass{article}
\usepackage{../style}
%%\usepackage{../langs}

\begin{document}

\section*{Coursework 8 (Scala, Regular Expressions}

This coursework is worth 10\%. It is about regular expressions and
pattern matching. The first part is due on 30 November at 11pm; the
second, more advanced part, is due on 7 December at 11pm. The
second part is not yet included. For the first part you are
asked to implement a regular expression matcher. Make sure the files
you submit can be processed by just calling \texttt{scala
  <<filename.scala>>}.\bigskip

\noindent
\textbf{Important:} Do not use any mutable data structures in your
submission! They are not needed. This excludes the use of
\texttt{ListBuffer}s, for example. Do not use \texttt{return} in your
code! It has a different meaning in Scala, than in Java.  Do not use
\texttt{var}! This declares a mutable variable.  Make sure the
functions you submit are defined on the ``top-level'' of Scala, not
inside a class or object.


\subsection*{Disclaimer!!!!!!!!}

It should be understood that the work you submit represents
your own effort! You have not copied from anyone else. An
exception is the Scala code I showed during the lectures or
uploaded to KEATS, which you can freely use.\bigskip


\subsection*{Part 1 (6 Marks)}

The task is to implement a regular expression matcher that is based on
derivatives of regular expressions. The implementation can deal
with the following regular expressions, which have been predefined
in the file re.scala:

\begin{center}
\begin{tabular}{lcll}
  $r$ & $::=$ & $\ZERO$     & cannot match anything\\
      &   $|$ & $\ONE$      & can only match the empty string\\
      &   $|$ & $c$         & can match a character (in this case $c$)\\
      &   $|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\
  &   $|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\
          &  & & then the second part with $r_2$\\
      &   $|$ & $r^*$       & can match zero or more times $r$\\
\end{tabular}
\end{center}

\noindent 
Why? Knowing how to match regular expressions and strings fast will
let you solve a lot of problems that vex other humans. Regular
expressions are one of the fastest and simplest ways to match patterns
in text, and are endlessly useful for searching, editing and
analysing text in all sorts of places. However, you need to be
fast, otherwise you will stumble over problems such as recently reported at

{\small
\begin{itemize}
\item[$\bullet$] \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}
\item[$\bullet$] \url{https://vimeo.com/112065252}
\item[$\bullet$] \url{http://davidvgalbraith.com/how-i-fixed-atom/}  
\end{itemize}}

\subsection*{Tasks (file re.scala)}

\begin{itemize}
\item[(1a)] Implement a function, called \textit{nullable}, by recursion over
  regular expressions. This function tests whether a regular expression can match
  the empty string.

\begin{center}
\begin{tabular}{lcl}
$\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
$\textit{nullable}(\ONE)$  & $\dn$ & $\textit{true}$\\
$\textit{nullable}(c)$     & $\dn$ & $\textit{false}$\\
$\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\
$\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
$\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\
\end{tabular}
\end{center}\hfill[1 Mark]

\item[(1b)] Implement a function, called \textit{der}, by recursion over
  regular expressions. It takes a character and a regular expression
  as arguments and calculates the derivative regular expression according
  to the rules:

\begin{center}
\begin{tabular}{lcl}
$\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
$\textit{der}\;c\;(\ONE)$  & $\dn$ & $\ZERO$\\
$\textit{der}\;c\;(d)$     & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\
$\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\
$\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
      & & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\
      & & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\
$\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\
\end{tabular}
\end{center}

For example given the regular expression $r = (a \cdot b) \cdot c$, the derivatives
w.r.t.~the characters $a$, $b$ and $c$ are

\begin{center}
  \begin{tabular}{lcll}
    $\textit{der}\;a\;r$ & $=$ & $(\ONE \cdot b)\cdot c$ & ($= r'$)\\
    $\textit{der}\;b\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$\\
    $\textit{der}\;c\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$
  \end{tabular}
\end{center}

Let $r'$ stand for the first derivative, then taking the derivatives of $r'$
w.r.t.~the characters $a$, $b$ and $c$ gives

\begin{center}
  \begin{tabular}{lcll}
    $\textit{der}\;a\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$ \\
    $\textit{der}\;b\;r'$ & $=$ & $((\ZERO \cdot b) + \ONE)\cdot c$ & ($= r''$)\\
    $\textit{der}\;c\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$
  \end{tabular}
\end{center}

One more example: Let $r''$ stand for the second derivative above,
then taking the derivatives of $r''$ w.r.t.~the characters $a$, $b$
and $c$ gives

\begin{center}
  \begin{tabular}{lcll}
    $\textit{der}\;a\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$ \\
    $\textit{der}\;b\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$\\
    $\textit{der}\;c\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ONE$
  \end{tabular}
\end{center}

Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
\mbox{}\hfill\mbox{[1 Mark]}

\item[(1c)] Implement the function \textit{simp}, which recursively
  traverses a regular expression from the inside to the outside, and
  simplifies every sub-regular-expression on the left (see below) to
  the regular expression on the right, except it does not simplify inside
  ${}^*$-regular expressions.

  \begin{center}
\begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll}
$r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ 
$\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ 
$r \cdot \ONE$ & $\mapsto$ & $r$\\ 
$\ONE \cdot r$ & $\mapsto$ & $r$\\ 
$r + \ZERO$ & $\mapsto$ & $r$\\ 
$\ZERO + r$ & $\mapsto$ & $r$\\ 
$r + r$ & $\mapsto$ & $r$\\ 
\end{tabular}
  \end{center}

  For example the regular expression
  \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]

  simplifies to just $r_1$. 
  \hfill[1 Mark]

\item[(1d)] Implement two functions: The first, called \textit{ders},
  takes a list of characters and a regular expression as arguments, and
  builds the derivative w.r.t.~the list as follows:

\begin{center}
\begin{tabular}{lcl}
$\textit{ders}\;(Nil)\;r$ & $\dn$ & $r$\\
  $\textit{ders}\;(c::cs)\;r$  & $\dn$ &
    $\textit{ders}\;cs\;(\textit{simp}(\textit{der}\;c\;r))$\\
\end{tabular}
\end{center}

The second, called \textit{matcher}, takes a string and a regular expression
as arguments. It builds first the derivatives according to \textit{ders}
and after that tests whether the resulting derivative regular expression can match
the empty string (using \textit{nullable}).
For example the \textit{matcher} will produce true given the
regular expression $(a\cdot b)\cdot c$ and the string $abc$.
\hfill[1 Mark]

\item[(1e)] Implement the function $\textit{replace}\;r\;s_1\;s_2$: it searches
  (from the left to 
right) in the string $s_1$ all the non-empty substrings that match the 
regular expression $r$---these substrings are assumed to be
the longest substrings matched by the regular expression and
assumed to be non-overlapping. All these substrings in $s_1$ matched by $r$
are replaced by $s_2$. For example given the regular expression

\[(a \cdot a)^* + (b \cdot b)\]

\noindent the string $s_1 = aabbbaaaaaaabaaaaabbaaaabb$ and
replacement string $s_2 = c$ yields the string

\[
ccbcabcaccc
\]

\hfill[2 Mark]
\end{itemize}

\subsection*{Part 2 (4 Marks)}

Coming soon.

\end{document}


%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End:
author	Christian Urban <urbanc@in.tum.de>
	Sat, 26 Nov 2016 17:02:52 +0000
changeset 74	fb2d8012ed08
parent 69	f1295a0ab4ed
child 75	71e463b33a9e
permissions	-rw-r--r--