pep-material: comparison cws/cw03.tex

equal deleted inserted replaced

-:114a89518aea
+:4383809c176a
 \documentclass{article}
 \usepackage{../style}
 \usepackage{../langs}
+\usepackage{tikz}
+\usepackage{pgf}
+\usepackage{pgfplots}
+\begin{filecontents}{re-python2.data}
+1 0.033
+5 0.036
+10 0.034
+15 0.036
+18 0.059
+19 0.084
+20 0.141
+21 0.248
+22 0.485
+23 0.878
+24 1.71
+25 3.40
+26 7.08
+27 14.12
+28 26.69
+\end{filecontents}
+\begin{filecontents}{re-java.data}
+5  0.00298
+10  0.00418
+15  0.00996
+16  0.01710
+17  0.03492
+18  0.03303
+19  0.05084
+20  0.10177
+21  0.19960
+22  0.41159
+23  0.82234
+24  1.70251
+25  3.36112
+26  6.63998
+27  13.35120
+28  29.81185
+\end{filecontents}
 \begin{document}
-\section*{Coursework 8 (Scala, Regular Expressions)}
+\section*{Coursework 8 (Scala, Regular Expressions, Brainf***)}
 This coursework is worth 10\%. It is about regular expressions,
-pattern matching and polymorphism. The first part is due on 30
+pattern matching and an interpreter. The first part is due on 30
 November at 11pm; the second, more advanced part, is due on 21
-December at 11pm. You are asked to implement a regular expression
+December at 11pm. In the first part, you are asked to implement a
-matcher based on derivatives of regular expressions. The reason is
+regular expression matcher based on derivatives of regular
-that regular expression matching in Java can be extremely slow
+expressions. The reason is that regular expression matching in Java
-sometimes.\bigskip
+can sometimes be extremely slow. The advanced part is about an
+interpreter for a very simple programming language.\bigskip
 \noindent
 \textbf{Important:}
 \begin{itemize}
 \subsection*{Part 1 (6 Marks)}
 The task is to implement a regular expression matcher that is based on
-derivatives of regular expressions. The implementation can deal
+derivatives of regular expressions. Most of the functions are defined by
-with the following regular expressions, which have been predefined
+recursion over regular expressions and can be elegantly implemented
-in the file re.scala:
+using Scala's pattern-matching. The implementation should deal with the
+following regular expressions, which have been predefined in the file
+\texttt{re.scala}:
 \begin{center}
 \begin{tabular}{lcll}
 $r$ & $::=$ & $\ZERO$     & cannot match anything\\
 &   $|$ & $\ONE$      & can only match the empty string\\
 \subsubsection*{Tasks (file re.scala)}
 \begin{itemize}
 \item[(1a)] Implement a function, called \textit{nullable}, by
 recursion over regular expressions. This function tests whether a
-regular expression can match the empty string, that is given a
+regular expression can match the empty string. This means given a
 regular expression it either returns true or false.
 \begin{center}
 \begin{tabular}{lcl}
 $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
 Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
 \mbox{}\hfill\mbox{[1 Mark]}
 \item[(1c)] Implement the function \textit{simp}, which recursively
 traverses a regular expression from the inside to the outside, and
-simplifies every sub-regular-expression on the left (see below) to
+on the way simplifies every regular expression on the left (see
-the regular expression on the right, except it does not simplify inside
+below) to the regular expression on the right, except it does not
-${}^*$-regular expressions.
+simplify inside ${}^*$-regular expressions.
 \begin{center}
 \begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll}
 $r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\
 $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\
 \end{center}
 For example the regular expression
 \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
-simplifies to just $r_1$. \textbf{Hints:} Regular expressions can be
+simplifies to just $r_1$. \textbf{Hint:} Regular expressions can be
 seen as trees and there are several methods for traversing
-trees. One of them corresponds to the inside-out traversal. Also
+trees. One of them corresponds to the inside-out traversal, which is
-remember numerical expressions from school: there you had expressions
+sometimes also called post-order traversal. Furthermore,
+remember numerical expressions from school times: there you had expressions
 like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$
 and simplification rules that looked very similar to rules
 above. You would simplify such numerical expressions by replacing
 for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then
 look whether more rules are applicable. If you organise the
 simplification in an inside-out fashion, it is always clear which
-rule should applied next.\\\mbox{}\hfill[1 Mark]
+rule should be applied next.\hfill[2 Marks]
 \item[(1d)] Implement two functions: The first, called \textit{ders},
 takes a list of characters and a regular expression as arguments, and
 builds the derivative w.r.t.~the list as follows:
 The second function, called \textit{matcher}, takes a string and a
 regular expression as arguments. It builds first the derivatives
 according to \textit{ders} and after that tests whether the resulting
 derivative regular expression can match the empty string (using
 \textit{nullable}).  For example the \textit{matcher} will produce
-true given the regular expression $(a\cdot b)\cdot c$ and the string
+true for the regular expression $(a\cdot b)\cdot c$ and the string
-$abc$.\\  \mbox{}\hfill[1 Mark]
+$abc$, but false if you give it the string $ab$. \hfill[1 Mark]
-\item[(1e)] Implement the function $\textit{replace}\;r\;s_1\;s_2$: it searches
+\item[(1e)] Implement a function, called \textit{size}, by recursion
-(from the left to
-right) in the string $s_1$ all the non-empty substrings that match the
-regular expression $r$---these substrings are assumed to be
-the longest substrings matched by the regular expression and
-assumed to be non-overlapping. All these substrings in $s_1$ matched by $r$
-are replaced by $s_2$. For example given the regular expression
-\[(a \cdot a)^* + (b \cdot b)\]
-\noindent the string $s_1 = aabbbaaaaaaabaaaaabbaaaabb$ and
-replacement the string $s_2 = c$ yields the string
-\[
-ccbcabcaccc
-\]
-\hfill[2 Marks]
-\end{itemize}
-\subsection*{Part 2 (4 Marks)}
-You need to copy all the code from \texttt{re.scala} into
-\texttt{re2.scala} in order to complete Part 2.  Parts (2a) and (2b)
-give you another method and datapoints for testing the \textit{der}
-and \textit{simp} functions from Part~1.
-\subsubsection*{Tasks (file re2.scala)}
-\begin{itemize}
-\item[(2a)] Write a \textbf{polymorphic} function, called
-\textit{iterT}, that is \textbf{tail-recursive}(!) and takes an
-integer $n$, a function $f$ and an $x$ as arguments. This function
-should iterate $f$ $n$-times starting with the argument $x$, like
-\[\underbrace{f(\ldots (f}_{n\text{-times}}(x)))
-\]
-More formally that means \textit{iterT} behaves as follows:
-\begin{center}
-\begin{tabular}{lcl}
-$\textit{iterT}(n, f, x)$ & $\dn$ &
-$\begin{cases}
-\;x & \textit{if}\;n = 0\\
-\;f(\textit{iterT}(n - 1, f, x)) & \textit{otherwise}
-\end{cases}$
-\end{tabular}
-\end{center}
-Make sure you write a \textbf{tail-recursive} version of
-\textit{iterT}.  If you add the annotation \texttt{@tailrec} (see
-below) your code should not produce an error message.
-\begin{lstlisting}[language=Scala, numbers=none, xleftmargin=-1mm]
-import scala.annotation.tailrec
-@tailrec
-def iterT[A](n: Int, f: A => A, x: A): A = ...
-\end{lstlisting}
-You can assume that \textit{iterT} will only be called for positive
-integers $0 \le n$. Given the type variable \texttt{A}, the type of
-$f$ is \texttt{A => A} and the type of $x$ is \texttt{A}. This means
-\textit{iterT} can be used, for example, for functions from integers
-to integers, or strings to strings, or regular expressions to
-regular expressions.  \\ \mbox{}\hfill[2 Marks]
-\item[(2b)] Implement a function, called \textit{size}, by recursion
 over regular expressions. If a regular expression is seen as a tree,
 then \textit{size} should return the number of nodes in such a
 tree. Therefore this function is defined as follows:
 \begin{center}
 $\textit{size}(r_1 \cdot r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
 $\textit{size}(r^*)$ & $\dn$ & $1 + \textit{size}(r)$\\
 \end{tabular}
 \end{center}
-You can use \textit{size} and \textit{iterT} in order to test how much
+You can use \textit{size} in order to test how much the `evil' regular
-the 'evil' regular expression $(a^*)^* \cdot b$ grows when taking
+expression $(a^*)^* \cdot b$ grows when taking successive derivatives
-successive derivatives according the letter $a$ and then compare it to
+according the letter $a$ without simplification and then compare it to
-taking the derivative, but simlifying the derivative after each step.
+taking the derivative, but simplify the result.  The sizes
-For example, the calls
+are given in \texttt{re.scala}. \hfill[1 Mark]
+\end{itemize}
-\begin{lstlisting}[language=Scala, numbers=none, xleftmargin=-1mm]
-size(iterT(20, (r: Rexp) => der('a', r), EVIL))
-size(iterT(20, (r: Rexp) => simp(der('a', r)), EVIL))
-\end{lstlisting}
-produce without simplification a regular expression of size of
-7340068 after 20 iterations, while the one with
-simplification gives
-just 8.\\ \mbox{}\hfill[1 Mark]
-\item[(2c)] Write a \textbf{polymorphic} function, called
-\textit{fixpT}, that takes
-a function $f$ and an $x$ as arguments. The purpose
-of \textit{fixpT} is to calculate a fixpoint of the function $f$
-starting from the argument $x$.
-A fixpoint, say $y$, is when $f(y) = y$ holds.
-That means \textit{fixpT} behaves as follows:
-\begin{center}
-\begin{tabular}{lcl}
-$\textit{fixpT}(f, x)$ & $\dn$ &
-$\begin{cases}
-\;x & \textit{if}\;f(x) = x\\
-\;\textit{fixpT}(f, f(x)) & \textit{otherwise}
-\end{cases}$
-\end{tabular}
-\end{center}
-Make sure you calculate in the code of $\textit{fixpT}$ the result
-of $f(x)$ only once. Given the type variable \texttt{A} in
-$\textit{fixpT}$, the type of $f$ is \texttt{A => A} and the type of
-$x$ is \texttt{A}. The file \texttt{re2.scala} gives two example
-function where in one the fixpoint is 1 and in the other
-it is the string $a$.\\ \mbox{}\hfill[1 Mark]
-\end{itemize}\bigskip
 \subsection*{Background}
 Although easily implementable in Scala, the idea behind the derivative
 function might not so easy to be seen. To understand its purpose
 better, assume a regular expression $r$ can match strings of the form
 $c\!::\!cs$ (that means strings which start with a character $c$ and have
-some rest, or tail, $cs$). If you now take the derivative of $r$ with
+some rest, or tail, $cs$). If you take the derivative of $r$ with
-respect to the character $c$, then you obtain a regular expressions
+respect to the character $c$, then you obtain a regular expression
 that can match all the strings $cs$.  In other words, the regular
-expression $\textit{der}\;c\;r$ can match the same strings $c::cs$
+expression $\textit{der}\;c\;r$ can match the same strings $c\!::\!cs$
 that can be matched by $r$, except that the $c$ is chopped off.
 Assume now $r$ can match the string $abc$. If you take the derivative
 according to $a$ then you obtain a regular expression that can match
 $bc$ (it is $abc$ where the $a$ has been chopped off). If you now
-build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r))$ you
+build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r)$ you
 obtain a regular expression that can match the string $c$ (it is $bc$
 where $b$ is chopped off). If you finally build the derivative of this
 according $c$, that is
-$\textit{der}\;c\;(\textit{der}\;b\;(\textit{der}\;a\;r)))$, you
+$\textit{der}\;c\;(\textit{der}\;b\;(\textit{der}\;a\;r))$, you obtain
-obtain a regular expression that can match the empty string. You can
+a regular expression that can match the empty string. You can test
-test this using the function nullable, which is what your matcher is
+whether this is indeed the case using the function nullable, which is
-doing.
+what your matcher is doing.
-The purpose of the simp function is to keep the regular expression
+The purpose of the $\textit{simp}$ function is to keep the regular
-small. Normally the derivative function makes the regular expression
+expression small. Normally the derivative function makes the regular
-bigger (see the SEQ case and the example in (2b)) and the algorithm
+expression bigger (see the SEQ case and the example in (1b)) and the
-would be slower and slower over time. The simp function counters this
+algorithm would be slower and slower over time. The $\textit{simp}$
-increase in size and the result is that the algorithm is fast
+function counters this increase in size and the result is that the
-throughout.  By the way, this algorithm is by Janusz Brzozowski who
+algorithm is fast throughout.  By the way, this algorithm is by Janusz
-came up with the idea of derivatives in 1964 in his PhD thesis.
+Brzozowski who came up with the idea of derivatives in 1964 in his PhD
+thesis.
 \begin{center}\small
 \url{https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)}
 \end{center}
+If you want to see how badly the regular expression matchers do in
+Java and Python with the `evil' regular expression, then have a look
+at the graphs below (you can try it out for yourself: have a look at
+the file \texttt{catastrophic.java} on KEATS). Compare this with the
+matcher you have implemented. How long can the string of $a$'s be
+in your matcher and stay within the 30 seconds time limit?
+\begin{center}
+\begin{tikzpicture}
+\begin{axis}[
+title={Graph: $(a^*)^*\cdot b$ and strings
+$\underbrace{a\ldots a}_{n}$},
+xlabel={$n$},
+x label style={at={(1.05,0.0)}},
+ylabel={time in secs},
+enlargelimits=false,
+xtick={0,5,...,30},
+xmax=33,
+ymax=35,
+ytick={0,5,...,30},
+scaled ticks=false,
+axis lines=left,
+width=6cm,
+height=5.0cm,
+legend entries={Python, Java},
+legend pos=outer north east]
+\addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
+\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
+\end{axis}
+\end{tikzpicture}
+\end{center}
+\newpage
+\subsection*{Part 2 (4 Marks)}
+Comming from Java or C++, you might think Scala is a quite
+esotheric programming language.  But remember, some serious companies
+have built their business on Scala. And there are far more esotheric
+languages out there. One is called \emph{brainf***}. Urban M\"uller
+developed this language in 1993.  A close relative was already
+introduced in ... by Corado B\"ohm, an Italian computer pionier, who
+unfortunately died a few months ago. One feature of brainf*** is its
+minimalistic set of instructions. It has just 8 instructions, all of
+which are single characters. Despite this minimalism, this language,
+given enough memory, has been shown to be Turing complete. In this
+part you will implement an interpreter for this language.
+\subsubsection*{Tasks (file bf.scala)}
+\begin{itemize}
+\item[(2a)]
+\item[(2b)]
+\item[(2c)]
+\end{itemize}\bigskip
 \end{document}
 %%% Local Variables:

changeset 153	4383809c176a
parent 152	114a89518aea
child 154	39c6b93718f0